当前位置: 动力学知识库 > 问答 > 编程问答 >

Parsing section of the file with specific heading using awk , sed , grep

问题描述:

I have a file that i s broken down to bunch of different headings. I need to grep out certain fields under particular heading. For example I want to print names under PRIORITY USERS heading. I can grep for this segment using grep and print the names ( Like grep -A 10 "PRIORITY USERS "| grep name: ) but I need to limit my output only to names under PRIORITY USERS heading. The problem is that the number of entries under each heading varies , so I cannot use a fixed number with grep -A option

Can you assist please?

Input file

USERS:

name: 286

fields1

fields 3

name: 123

fieldx: test

PRIORITY USERS:

name: jack

field1: 8

name: Joe

name: bob

field1: xyz

name: tempo

kind: Text

SEGMENT3

name: ginger

name: max

Non-USERS

Name: JOJO

Output should be :

PRIORITY USERS:

name: jack

name: bob

name: tempo

Thanking you all in advance

网友答案:

cat sample.csv

USERS:

           name: 286
           fields1
           fields 3

           name: 286
           fields 4

PRIORITY USERS:

           name: Jack
           field1:  8
           name: Joe

 SEGMENT3

           name: ginger
           name: max

 Non-USERS

           Name: JOJO

sed -n '/PRIORITY USERS/,/SEGMENT3/p' sample.csv | grep name

   name: Jack
   name: Joe

'/PRIORITY USERS/,/SEGMENT3/' PRIORITY USERS are the start pattern and SEGMENT3 is the end pattern, we only print out the lines between those 2 patterns and then get the name

网友答案:

It looks like the top-level headings can be characterized as occurring on lines that start with at most one blank. If that's the case, then the following has the advantage of not requiring knowledge of the top-level heading after the target heading:

sed -r -n '/^ ?PRIORITY USERS/,/^ ?[^ ]/ {/name:/p ; }'

(Some versions of sed require -E instead of -r for extended regex support.)

In any case, there is no need to invoke both sed and grep.

One advantage of using 'awk' here is that you can use the "?" in the regular expressions without having to set a flag:

awk '/^ ?PRIORITY USERS/ {s++; next}
     s==1 {if (/^ ?[^ ]/) {s++} else if (/name:/) {print}}'
网友答案:

awk to the rescue!

$ awk -v RS= 'f{print;exit} /PRIORITY USERS:/{f=1}' file

           name: ack
           field1:  8
           name: Joe

I guess, also there is an unwritten requirement to filter out lines non-names. For that, slightly change the script

$ awk -F'\n' -v RS= 'f{for(i=1;i<=NF;i++) if($i~/name:/) print $i;exit}
     /PRIORITY USERS:/{f=1}' file

           name: ack
           name: Joe

UPDATE: based on the updated input file, this will produce the names list

$ awk '/SEGMENT3/{f=0} f&&/name:/; /PRIORITY USERS:/{f=1}' file

           name: jack
           name: Joe
           name: bob
           name: tempo

Note: Your output sample is missing "Joe". If instead you missed "bob", there was a nice joke!

网友答案:
awk -vRS= -F'\n' '/SEGMENT/{a=0}a{$0=$1}/PRIORITY/{a=1}a' file
PRIORITY USERS:
           name: jack
           name: bob
           name: tempo
网友答案:
$ cat tst.awk
/^[[:space:]]?[^[:space:]]/ { inSect = ($0 ~ ("^[[:space:]]?" sect "[[:space:]:]*$") ? 1 : 0) }
inSect && ($0 ~ "^[[:space:]]+" field ":")

$ awk -v sect='PRIORITY USERS' -v field='name' -f tst.awk file
           name: jack
           name: Joe
           name: bob
           name: tempo

The above is complicated due to your input format being so wildly variant with some header lines starting with space, others no, some having immediate trailing semi-colons which others have spaces instead, etc. and it assumes you just missed name: joe from your expected output.

分享给朋友:
您可能感兴趣的文章:
随机阅读: