Removing lines containing a unique first field with awk?

Looking at printing only lines with a double first field. for example, from data that looks like this:

1 abcd 1 efgh 2 ijkl 3 mnop 4 qrst 4 uvwx 

Must print:

 1 abcd 1 efgh 4 qrst 4 uvwx 

(FYI - the first field is not always 1 character in my data)

+4
source share
5 answers
 awk 'FNR==NR{a[$1]++;next}(a[$1] > 1)' ./infile ./infile 

Yes, you give it the same file as the input twice. Since you do not know in advance if the current record is uniq or not, you create an array based on $1 in the first pass, then you only output records that saw $1 more than once in the second pass.

I'm sure there are ways to do this in just one pass through the file, but I doubt they will be as clean

Explanation

  • FNR==NR : This is true only when awk reads the first file. It essentially checks the total number of records seen (NR) and the input record in the current file (FNR).
  • a[$1]++ : create an associative array a , which is the first field ( $1 ), and the value of which increases each time you look at it.
  • next : ignore the rest of the script, if this is achieved, start with a new entry
  • (a[$1] > 1) This will be evaluated only on the second pass ./infile , and it prints only those records in which we saw the first field ( $1 ) more than once. This is essentially a shorthand for if(a[$1] > 1){print $0}

Proof of concept

 $ cat ./infile 1 abcd 1 efgh 2 ijkl 3 mnop 4 qrst 4 uvwx $ awk 'FNR==NR{a[$1]++;next}(a[$1] > 1)' ./infile ./infile 1 abcd 1 efgh 4 qrst 4 uvwx 
+5
source

Here is some awk code to do what you want, assuming the input is grouped by its first field already (for example, uniq also required):

 BEGIN {f = ""; l = ""} { if ($1 == f) { if (l != "") { print l l = "" } print $0 } else { f = $1 l = $0 } } 

In this code, f is the previous value of field 1, and l is the first line of the group (or empty if it has already been printed).

+1
source
 BEGIN { IDLE = 0; DUP = 1; state = IDLE } { if (state == IDLE) { if($1 == lasttime) { state = DUP print lastline } else state = IDLE } else { if($1 != lasttime) state = IDLE } if (state == DUP) print $0 lasttime = $1 lastline = $0 } 
+1
source

Assuming the ordered input you specify in your question:

 awk '$1 == prev {if (prevline) print prevline; print $0; prevline=""; next} {prev = $1; prevline=$0}' inputfile 

The file needs to be read only once.

0
source

If you can use Ruby (1.9+)

 #!/usr/bin/env ruby hash = Hash.new{|h,k|h[k] = []} File.open("file").each do |x| a,b=x.split(/\s+/,2) hash[a] << b end hash.each{|k,v| hash[k].each{|y| puts "#{k} #{y}" } if v.size>1 } 

output:

 $ cat file 1 abcd 1 efgh 2 ijkl 3 mnop 4 qrst 4 uvwx 4 asdf 1 xzzz $ ruby arrange.rb 1 abcd 1 efgh 1 xzzz 4 qrst 4 uvwx 4 asdf 
0
source

Source: https://habr.com/ru/post/1341381/


All Articles