Find rows with a common value in a specific column

Suppose I have a file like this

5  kata 45 buu
34 tuy  3  rre
21 ppo  90 ty
21 ret  60 buu
09 ret  89 ty
21 plk  1  uio
23 kata 90 ty

I want to have in the output only rows containing duplicate values ​​in the fourth column. Therefore, my desired result would be as follows:

5  kata 45 buu
21 ppo  90 ty
21 ret  60 buu
09 ret  89 ty
23 kata 90 ty

How can I complete this task?

I can identify and isolate the column of interest with:

awk -F"," '{print $4}' file1 > file1_temp

and then check if there are duplicate values ​​and how many:

awk '{dups[$1]++} END{for (num in dups) {print num,dups[num]}}' file1_temp

but that is definitely not what I would like to do.

+4
source share
1 answer

An easy way to maintain order would be to run the file twice. For the first time, save a record of samples, then print those with a counter greater than 1 in the second pass:

awk 'NR == FNR { ++count[$4]; next } count[$4] > 1' file file

, END:

 awk '{ line[NR] = $0; col[NR] = $4; ++count[$4] } 
  END { for (i = 1; i <= NR; ++i) if (count[col[i]] > 1) print line[i] }' file

line , col , count , .

+4

Source: https://habr.com/ru/post/1624952/


All Articles