Extract matching rows from CSV

Question

Extract matching rows from CSV

I have a file that looks like this:

64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 1.2.3.4, 
64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 4.5.6.7, 
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, silly string
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, crazy town
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 8.9.0.1, wild wood
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 0.0.0.0/0, wacky tabacky
611f8cf5-f6f2-4f3a-ad24-12245652a7bd, ip, 0.0.0.0/0, cuckoo cachoo

I would like to extract a list of only unique GUIDs, where

Column 3 GUID missing 0.0.0.0/0
column 3 corresponds to 0.0.0.0/0, and there is more than one instance of the GUID and where at least one of the matches is not equal to 0.0.0.0/0

In this case, the desired output would be:

64fe12c7-b50c-4f63-b292-99f4ed74e5aa
db86d211-0b09-4a8f-b222-a21a54ad2f9c

Trying to think through this, it seems to me that I should create an array / list of unique GUIDs, and then write grep the corresponding lines and start the process of the two conditions above, but I just don’t know the best ones can be done in a short script or, possibly, in grep / awk / sort / cut single liner. Appreciate any help!

(the source file is 4 csv columns, where the 4th column is often null)

+4

awk csv

mikernova Jan 20 '18 at 14:24

source share

5

awk:

awk -F, '$3 !~/0\.0\.0\.0\/0/ && !seen[$1]++{print $1}' infile

:

$3 !~/0\.0\.0\.0\/0/ 3 (&&)
!seen[$1]++ 1 ( , awk ($1), 1, )
- ! -
- seen -
- $1 -
- ++ ( )
print $1 1

:

$ cat infile
64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 1.2.3.4, 
64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 4.5.6.7, 
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, silly string
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, crazy town
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 8.9.0.1, wild wood
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 0.0.0.0/0, wacky tabacky
611f8cf5-f6f2-4f3a-ad24-12245652a7bd, ip, 0.0.0.0/0, cuckoo cachoo

$ awk -F, '$3 !~/0\.0\.0\.0\/0/ && !seen[$1]++{print $1}' infile
64fe12c7-b50c-4f63-b292-99f4ed74e5aa
db86d211-0b09-4a8f-b222-a21a54ad2f9c

+2

Akshay Hegde 20 . '18 15:20

, :

, 3 0.0.0.0/0: grep -v '^[^,]*,[^,]*, *0\.0\.0\.0/0,'
1: cut -d, -f1
: sort -u (, , uniq)

grep -v '^[^,]*,[^,]*, *0\.0\.0\.0/0,' | cut -d, -f1 | sort -u

0

melpomene 20 . '18 14:33

Just adding another possible solution similar (but ugly and using more than one command) than the other proposed solution awk. If I understand the question correctly, your condition number 2 has already been taken into account # 1. In any case, the following worked for me awk+sort:

awk -F, '$3!~/^ 0\.0\.0\.0\/0/ {print $1}' file.csv | sort -u

Using the -u(unique) flag on sortwill eliminate duplicates. Not completely reliable, but works in this case.

Hope this helps!

0

Vinicius placco Jan 20 '18 at 16:16

source share

The following awkmay also help you with this.

awk -F', +' '$3 ~ /0\.0\.0\.0\/0/{next} !a[$1]++{print $1}'   Input_file

The output will be as follows.

64fe12c7-b50c-4f63-b292-99f4ed74e5aa
db86d211-0b09-4a8f-b222-a21a54ad2f9c

0

RavinderSingh13 Jan 20 '18 at 18:03

source share

RomanPerekhrest · Accepted Answer · 2018-01-20T15:11:22+0000

Awk:

awk -F',[[:space:]]*' '$3 !~ /^(0\.){3}0\/0/{ guids[$1] }
                       END{ for(k in guids) print k }' testfile.txt

:

db86d211-0b09-4a8f-b222-a21a54ad2f9c
64fe12c7-b50c-4f63-b292-99f4ed74e5aa

Extract matching rows from CSV

More articles: