Finding a common value for multiple files containing values ​​of one column

I have 100 text files containing one column each. Files are similar to:

file1.txt 10032 19873 18326 file2.txt 10032 19873 11254 file3.txt 15478 10032 11254 

etc. The size of each file is different. Tell me how to find numbers that are common in all of these 100 files.

The same number is displayed only once in 1 file.

+3
source share
4 answers

awk to the rescue!

to find a common element in all files (assuming uniqueness inside a single file)

 awk '{a[$1]++} END{for(k in a) if(a[k]==ARGC-1) print k}' files 

Count all occurrences and print values ​​in which the number is equal to the number of files.

+2
source

This will work regardless of whether the same number can be displayed several times in 1 file:

 $ awk '{a[$0][ARGIND]} END{for (i in a) if (length(a[i])==ARGIND) print i}' file[123] 10032 

The above example uses GNU awk for true multidimensional arrays and ARGIND. Where necessary, they are easily configured for other awks, for example:

 $ awk '!seen[$0,FILENAME]++{a[$0]++} END{for (i in a) if (a[i]==ARGC-1) print i}' file[123] 10032 

If the numbers are unique in each file, you only need to:

 $ awk '(++c[$0])==(ARGC-1)' file* 10032 
+5
source

Single Column Files ?

You can sort and compare these files using the shell:

 for f in file*.txt; do sort $f|uniq; done|sort|uniq -c -d 

The last -c not needed, it only needs to if you want to count the number of occurrences.

+1
source

One using Bash and comm , because I needed to know if this would work. My test files were 1 , 2 and 3 , so for f in ? :

 f=$(shuf -n1 -e ?) # pick one file randomly for initial comms file sort "$f" > comms for f in ? # this time for all files do comm -1 -2 <(sort "$f") comms > tmp # comms should be in sorted order always # grep -Fxf "$f" comms > tmp # another solution, thanks @Sundeep mv tmp comms done 
0
source

Source: https://habr.com/ru/post/1432061/


All Articles