Extract matching rows from CSV

I have a file that looks like this:

64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 1.2.3.4, 
64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 4.5.6.7, 
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, silly string
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, crazy town
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 8.9.0.1, wild wood
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 0.0.0.0/0, wacky tabacky
611f8cf5-f6f2-4f3a-ad24-12245652a7bd, ip, 0.0.0.0/0, cuckoo cachoo

I would like to extract a list of only unique GUIDs, where

  • Column 3 GUID missing 0.0.0.0/0
  • column 3 corresponds to 0.0.0.0/0, and there is more than one instance of the GUID and where at least one of the matches is not equal to 0.0.0.0/0

In this case, the desired output would be:

64fe12c7-b50c-4f63-b292-99f4ed74e5aa
db86d211-0b09-4a8f-b222-a21a54ad2f9c

Trying to think through this, it seems to me that I should create an array / list of unique GUIDs, and then write grep the corresponding lines and start the process of the two conditions above, but I just don’t know the best ones can be done in a short script or, possibly, in grep / awk / sort / cut single liner. Appreciate any help!

(the source file is 4 csv columns, where the 4th column is often null)

+4
source share
5

Awk:

awk -F',[[:space:]]*' '$3 !~ /^(0\.){3}0\/0/{ guids[$1] }
                       END{ for(k in guids) print k }' testfile.txt

:

db86d211-0b09-4a8f-b222-a21a54ad2f9c
64fe12c7-b50c-4f63-b292-99f4ed74e5aa
+1

awk:

awk -F, '$3 !~/0\.0\.0\.0\/0/ && !seen[$1]++{print $1}' infile

:

  • $3 !~/0\.0\.0\.0\/0/ 3 (&&)
  • !seen[$1]++ 1 ( , awk ($1), 1, )
    • ! -
    • seen -
    • $1 -
    • ++ ( )
  • print $1 1

:

$ cat infile
64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 1.2.3.4, 
64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 4.5.6.7, 
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, silly string
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, crazy town
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 8.9.0.1, wild wood
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 0.0.0.0/0, wacky tabacky
611f8cf5-f6f2-4f3a-ad24-12245652a7bd, ip, 0.0.0.0/0, cuckoo cachoo

$ awk -F, '$3 !~/0\.0\.0\.0\/0/ && !seen[$1]++{print $1}' infile
64fe12c7-b50c-4f63-b292-99f4ed74e5aa
db86d211-0b09-4a8f-b222-a21a54ad2f9c
+2

, :

  • , 3 0.0.0.0/0: grep -v '^[^,]*,[^,]*, *0\.0\.0\.0/0,'
  • 1: cut -d, -f1
  • : sort -u (, , uniq)
grep -v '^[^,]*,[^,]*, *0\.0\.0\.0/0,' | cut -d, -f1 | sort -u
0

Just adding another possible solution similar (but ugly and using more than one command) than the other proposed solution awk. If I understand the question correctly, your condition number 2 has already been taken into account # 1. In any case, the following worked for me awk+sort:

awk -F, '$3!~/^ 0\.0\.0\.0\/0/ {print $1}' file.csv | sort -u

Using the -u(unique) flag on sortwill eliminate duplicates. Not completely reliable, but works in this case.

Hope this helps!

0
source

The following awkmay also help you with this.

awk -F', +' '$3 ~ /0\.0\.0\.0\/0/{next} !a[$1]++{print $1}'   Input_file

The output will be as follows.

64fe12c7-b50c-4f63-b292-99f4ed74e5aa
db86d211-0b09-4a8f-b222-a21a54ad2f9c
0
source

Source: https://habr.com/ru/post/1692455/


All Articles