当前位置: 动力学知识库 > 问答 > 编程问答 >

nlp - changing every non letter character to \n in a file using unix utilities

问题描述:

i was watching a tutorial about using unix utilities the guy was using it on a MAC i had a windows laptop so i downloaded Gnuwin32 Package

then came a part where i want to replace any non letter character in a file with a newline "\n"

the command line in the tutorial was :

tr -sc 'A-Za-z' '\n' < filename.txt |less

it worked with him but when i tried it it put a singleqoute "'" character after character

'S'h'a'k'e's'p'e'a'r'e'T'H'E'T'E'M'P'E'S'T'f'r'o'm'O'n'l'i'n'e'L'i'b'r'a'r'y'o'f'L'i'b'e'r't'y'h't't'p'o'l'l'l'i'b'e'r't'y'f'u'n'd'o'r'g'

i tried

tr -sc "A-Za-z" "\n" < filename.txt |less

it added a new line after each character

n

e

L

i

b

r

a

i tried to remove the compliment option and add ^ in the regex

tr "[^A-Za-z]" "\n" < filename.txt |less

the result was replacing every letter with a newline

the Question is does Command line options in UNIX utilities of GNUwin32 differ than others ? and does putting the regex between single quotes like 'A-Z' differ than "A-Z"

and if so what would be the best answer to replace every non-letter character with a newline , other than the failed trials above

the source of the text i was trying on

网友答案:

I tested your examples in my tr --version (GNU coreutils) 8.5 and

1) using single or double quotes makes no difference 2) looks like there is no way to negate characters by using ^

When you write [^A-Za-z] all these chars are treated literally:

echo "abc abd [hh] d^o 1976" | tr '[^A-Za-z]' '.'

or with double quotes

echo "abc abd [hh] d^o 1976" | tr "[^A-Za-z]" '.'

produces the following output

... ... .... ... 1976

Which proves that all aphabetic chars, the caret and square brackets have been treated literally and replaced.

This leads us to the conclusion that to split by non-alphabetic chars you have to use -c with a range 'A-Za-z', exactly as you did in the first example.

网友答案:

Hm..

$ tr -sc '[A-Za-z]' "\n" < getCokeInfo_viaFinger_cmu.awk
bin
gawk
f
BEGIN
wisc
edu
finger

....

Note that I used char-class ( [A-Za-z] ). Maybe your tr requires that too.

I hope this helps.

网友答案:
cat file.txt | sed -re 's/[^a-zA-Z]/\n/g'

;)

分享给朋友:
您可能感兴趣的文章:
随机阅读: