Find rows with a common value in a specific column

Question

Find rows with a common value in a specific column

Suppose I have a file like this

5  kata 45 buu
34 tuy  3  rre
21 ppo  90 ty
21 ret  60 buu
09 ret  89 ty
21 plk  1  uio
23 kata 90 ty

I want to have in the output only rows containing duplicate values in the fourth column. Therefore, my desired result would be as follows:

5  kata 45 buu
21 ppo  90 ty
21 ret  60 buu
09 ret  89 ty
23 kata 90 ty

How can I complete this task?

I can identify and isolate the column of interest with:

awk -F"," '{print $4}' file1 > file1_temp

and then check if there are duplicate values and how many:

awk '{dups[$1]++} END{for (num in dups) {print num,dups[num]}}' file1_temp

but that is definitely not what I would like to do.

+4

linux bash awk

Transagonistica Jan 19 '16 at 11:46

source share

1 answer

Tom Fenech · Accepted Answer · 2016-01-19T12:02:13+0000

An easy way to maintain order would be to run the file twice. For the first time, save a record of samples, then print those with a counter greater than 1 in the second pass:

awk 'NR == FNR { ++count[$4]; next } count[$4] > 1' file file

, END:

 awk '{ line[NR] = $0; col[NR] = $4; ++count[$4] } 
  END { for (i = 1; i <= NR; ++i) if (count[col[i]] > 1) print line[i] }' file

line , col , count , .

Find rows with a common value in a specific column

More articles: