Deleting some lines according to a little complicated state

The contents of the text files are as follows.

600466 a 37.50 25.28 600466 b 31.13 18.22 600466 c 64.80 61.39 600467 a 38.79 30.00 600467 b 28.73 41.04 600467 c 58.32 61.39 600468 a 33.09 25.28 600468 b 35.57 42.69 600468 c 58.32 60.12 600469 a 36.89 29.80 600469 b 35.57 30.94 600469 c 64.80 62.49 600470 b 37.35 35.02 * 600470 c 58.32 58.32 * 600471 a 29.22 25.47 600471 b 34.74 20.61 600471 c 64.80 62.81 600472 b 31.13 30.28* 600472 c 58.32 62.04 * 

I checked a few lines with an asterisk.

You can get the first field of a row that is repeated 3 times (therefore, it appears exactly 3 times in a row), but some rows are not. I want to delete these lines with a few shell commands.

Does anyone have a fantasy?

(any correction of my poor English is appreciated. Thanks in advance)

0
source share
4 answers
 cut -d' ' -f1 file \ | uniq -c \ | grep -v ' 3 ' \ | rev | cut -d' ' -f1 | rev \ | grep -vwFf- file > output 

The first row displays the first column.

The second line counts how often each value was present.

The third line excludes those lines that were present 3 times.

The fourth line deletes the counters.

The fifth line excludes lines from the source file.

+1
source

You can use the following awk command:

 awk '++c[$1]<=3{m[$1]=m[$1]?m[$1]"\n"$0:$0}c[$1]==3{print m[$1]}c[$1]>3' 

Explained better in a multi-line, non-optimized version:

example.awk:

 { # Count the occurences of $1 c[$1]++ } c[$1]<=3{ # Append the current line to a temporary storage. If the # temporary storage doesn't exist, create it. m[$1]=m[$1]?m[$1]"\n"$0:$0 } # Print the temporary storage once $1 has appeared 3 times c[$1]==3{ printf "%s\n", m[$1] } # Print the current line if the count of $1 is above '3' c[$1]>3 

Name it as follows:

 awk -f example.awk input.txt 

Output:

 600466 a 37.50 25.28 600466 b 31.13 18.22 600466 c 64.80 61.39 600467 a 38.79 30.00 600467 b 28.73 41.04 600467 c 58.32 61.39 600468 a 33.09 25.28 600468 b 35.57 42.69 600468 c 58.32 60.12 600469 a 36.89 29.80 600469 b 35.57 30.94 600469 c 64.80 62.49 600471 a 29.22 25.47 600471 b 34.74 20.61 600471 c 64.80 62.81 
+2
source

if you can never have more than three consecutive lines with the corresponding keys, then:

 $ cat tst.awk $1 != prev { buf=""; cnt=0 } { buf = buf $0 ORS; cnt++; prev=$1 } cnt == 3 { printf "%s", buf } 

otherwise:

 $ cat tst.awk ($1 != prev) && (NR>1) { if (cnt == 3) { printf "%s", buf } buf = "" cnt = 0 } { buf = buf $0 ORS; cnt++; prev=$1 } END { if (cnt == 3) { printf "%s", buf } } 

Anyway:

 $ awk -f tst.awk file 600466 a 37.50 25.28 600466 b 31.13 18.22 600466 c 64.80 61.39 600467 a 38.79 30.00 600467 b 28.73 41.04 600467 c 58.32 61.39 600468 a 33.09 25.28 600468 b 35.57 42.69 600468 c 58.32 60.12 600469 a 36.89 29.80 600469 b 35.57 30.94 600469 c 64.80 62.49 600471 a 29.22 25.47 600471 b 34.74 20.61 600471 c 64.80 62.81 
+2
source

I would say:

 awk '$1 != prev_1 {if (a[prev_1]==3) print buffer; buffer=""} {a[$1]++; buffer = (buffer?buffer ORS:"") $0} {prev_1=$1} END {if (a[$1]==3) print buffer}' file 

That is, save the buffer in the variable buffer and print it whenever the first field changes, just in case its counter is exactly 3 .

Test

 $ awk '$1 != prev_1 {if (a[prev_1]==3) print buffer; buffer=""} {a[$1]++; buffer = (buffer?buffer ORS:"") $0} {prev_1=$1} END {if (a[$1]==3) print buffer}' a 600466 a 37.50 25.28 600466 b 31.13 18.22 600466 c 64.80 61.39 600467 a 38.79 30.00 600467 b 28.73 41.04 600467 c 58.32 61.39 600468 a 33.09 25.28 600468 b 35.57 42.69 600468 c 58.32 60.12 600469 a 36.89 29.80 600469 b 35.57 30.94 600469 c 64.80 62.49 600471 a 29.22 25.47 600471 b 34.74 20.61 600471 c 64.80 62.81 
+1
source

Source: https://habr.com/ru/post/1236918/


All Articles