Finding a common value for multiple files containing values of one column

Question

Finding a common value for multiple files containing values of one column

I have 100 text files containing one column each. Files are similar to:

file1.txt 10032 19873 18326 file2.txt 10032 19873 11254 file3.txt 15478 10032 11254

etc. The size of each file is different. Tell me how to find numbers that are common in all of these 100 files.

The same number is displayed only once in 1 file.

+3

awk text-processing comm

Abhinav srivastava Apr 18 '17 at 12:31

source share

4 answers

This will work regardless of whether the same number can be displayed several times in 1 file:

 $ awk '{a[$0][ARGIND]} END{for (i in a) if (length(a[i])==ARGIND) print i}' file[123] 10032

The above example uses GNU awk for true multidimensional arrays and ARGIND. Where necessary, they are easily configured for other awks, for example:

 $ awk '!seen[$0,FILENAME]++{a[$0]++} END{for (i in a) if (a[i]==ARGC-1) print i}' file[123] 10032

If the numbers are unique in each file, you only need to:

 $ awk '(++c[$0])==(ARGC-1)' file* 10032

+5

Ed morton Apr 18 '17 at 13:18

source share

Single Column Files ?

You can sort and compare these files using the shell:

 for f in file*.txt; do sort $f|uniq; done|sort|uniq -c -d

The last -c not needed, it only needs to if you want to count the number of occurrences.

+1

AN Apr 18 '17 at 12:39

source share

One using Bash and comm , because I needed to know if this would work. My test files were 1 , 2 and 3 , so for f in ? :

 f=$(shuf -n1 -e ?) # pick one file randomly for initial comms file sort "$f" > comms for f in ? # this time for all files do comm -1 -2 <(sort "$f") comms > tmp # comms should be in sorted order always # grep -Fxf "$f" comms > tmp # another solution, thanks @Sundeep mv tmp comms done

0

James brown Apr 18 '17 at 15:39

source share

karakfa · Accepted Answer · 2017-04-18T12:53:14+0000

awk to the rescue!

to find a common element in all files (assuming uniqueness inside a single file)

 awk '{a[$1]++} END{for(k in a) if(a[k]==ARGC-1) print k}' files

Count all occurrences and print values in which the number is equal to the number of files.

Finding a common value for multiple files containing values ​​of one column

More articles:

Finding a common value for multiple files containing values of one column