Search for a set of add-ons on Unix

Given two files:

 $ cat A.txt     $ cat B.txt
    3           11
    5           1
    1           12
    2           3
    4           2

I want to find the line number that is in "BUT NOT" in B. What is for the unix command?

I tried this, but it seems to fail:

comm -3 <(sort -n A.txt) <(sort -n B.txt) | sed 's/\t//g' 
+3
source share
4 answers
comm -2 -3 <(sort A.txt) <(sort B.txt)

should do what you want, if I understand you correctly.

Edit: in fact, you commneed files that need to be sorted in lexicographic order, so you do not want -nin your command sort:

$ cat A.txt
1
4
112
$ cat B.txt
1
112
# Bad:
$ comm -2 -3 <(sort -n B.txt) <(sort -n B.txt)
4
comm: file 1 is not in sorted order
112
# OK:
$ comm -2 -3 <(sort A.txt) <(sort B.txt)
4
+10
source

Note that the awk solution works, but keeps duplicates in (which are not in B); python solution cancels result

, comm ; B, comm "" :

$ cat A.txt 
120
121
122
122
$ cat B.txt 
121
122
121
$ comm -23 <(sort A.txt) <(sort B.txt)
120
122

, sort -u ( A):

$ comm -23 <(sort -u A.txt) <(sort B.txt)
120
+3

$ awk 'FNR==NR{a[$0];next} (!($0 in a))' B.txt A.txt
5
4
+2

Setdown, Set cli.

, , , Makefile:

someUnion: "file-1.txt" \/ "file-2.txt"
someIntersection: "file-1.txt" /\ "file-2.txt"
someDifference: someUnion - someIntersection

, . , . , - , . , , !

, , , .

Note . I think Setdown is much better than comm because Setup doesn't require you to sort your inputs correctly . Instead, Setdown will sort your inputs for you, and it uses the look. This way it can handle massive files. I consider this an important advantage, because the number of times I forgot to sort the files that I submitted to the commit does not matter.

+1
source

Source: https://habr.com/ru/post/1730502/


All Articles