当前位置: 动力学知识库 > 问答 > 编程问答 >

shell - Dynamic delimiter in Unix

问题描述:

Input:-

echo "1234ABC89,234" # A

echo "0520001DEF78,66" # B

echo "46545455KRJ21,00"

From the above strings, I need to split the characters to get the alphabetic field and the number after that.

From "1234ABC89,234", the output should be:

ABC

89,234

From "0520001DEF78,66", the output should be:

DEF

78,66

I have many strings that I need to split like this.

Here is my script so far:

echo "1234ABC89,234" | cut -d',' -f1

but it gives me 1234ABC89 which isn't what I want.

网友答案:

Assuming that you want to discard leading digits only, and that the letters will be all upper case, the following should work:

echo "1234ABC89,234" | sed 's/^[0-9]*\([A-Z]*\)\([0-9].*\)/\1\n\2/'

This works fine with GNU sed (I have 4.2.2), but other sed implementations might not like the \n, in which case you'll need to substitute something else.

网友答案:

Where do the strings come from? Are they read from a file (or other source external to the script), or are they stored in the script? If they're in the script, you should simply reformat the data so it is easier to manage. Therefore, it is sensible to assume they come from an external data source such as a file or being piped to the script.

You could simply feed the data through sed:

sed 's/^[0-9]*\([A-Z]*\)/\1 /' |
while read alpha number
do
    process the two fields
done

The only trick to watch there is that if you set variables in the loop, they won't necessarily be visible to the script after the done. There are ways around that problem some of which depend on which shell you use. This much is the same in any derivative of the Bourne shell.

网友答案:

You said you have many strings like this, so I recommend if possible save them to a file such as input.txt:

1234ABC89,234
0520001DEF78,66
46545455KRJ21,00

On your command line, try this sed command reading input.txt as file argument:

$ sed -E 's/([0-9]+)([[:alpha:]]{3})(.+)/\2\t\3/g' input.txt
ABC     89,234
DEF     78,66
KRJ     21,00

How it works

  • uses -E for extended regular expressions to save on typing, otherwise for example for grouping we would have to escape \(
  • uses grouping ( and ), searches three groups:
  • firstly digits, + specifies one-or-more of digits. Oddly using [0-9] results in an extra blank space above results, so use POSIX class [[:digit:]]
  • the next is to search for POSIX alphabetical characters, regardless if lowercase or uppercase, and {3} specifies to search for 3 of them
  • the last group searches for . meaning any character, + for one or more times
  • \2\t\3 then returns group 2 and group 3, with a tab separator

Thus you are able to extract two separate fields per line, just separated by tab, for easier manipulation later.

网友答案:

Depending on the version of sed you can try:

echo "0520001DEF78,66" | sed -E -e 's/[0-9]*([A-Z]*)([,0-9]*)/\1\n\2/'

or:

echo "0520001DEF78,66" | sed -E -e 's/[0-9]*([A-Z]*)([,0-9]*)/\1$\2/' | tr '$' '\n'

DEF
78,66

Explanation: the regular expression replaces the input with the expected output, except instead of the new-line it puts a "$" sign, that we replace to a new-line with the tr command

分享给朋友:
您可能感兴趣的文章:
随机阅读: