Easily count words in a list of files in a folder after grep -v

Question

Easily count words in a list of files in a folder after grep -v

I am trying to make scripts that I write simpler and easier.

There are many ways to write to get the word count of all files in a folder or even all files in a subdirectory of a folder.

For example, I could write

wc */*

and I can get this output (this is the desired result):

  0 0 0 10.53400000/YRI.GS000018623.NONSENSE.vcf 0 0 0 10.53400000/YRI.GS000018623.NONSTOP.vcf 0 0 0 10.53400000/YRI.GS000018623.PFAM.vcf 0 0 0 10.53400000/YRI.GS000018623.SPAN.vcf 0 0 0 10.53400000/YRI.GS000018623.SVLEN.vcf 2 20 624 10.53400000/YRI.GS000018623.SVTYPE.vcf 2 20 676 10.53400000/YRI.GS000018623.SYNONYMOUS.vcf 13 130 4435 10.53400000/YRI.GS000018623.TSS-UPSTREAM.vcf 425 4250 126381 10.53400000/YRI.GS000018623.UNKNOWN-INC.vcf

but if there are too many files, I can get an error message like the following:

 -bash: /usr/bin/wc: Argument list too long

so I could make a variable and make one folder at a time, for example:

 while read $FOLDER do wc $FOLDER/* >> outfile.txt done < "$FOLDER_LIST"

so this happens from one line to 5 in the same way.

Also, in one case, I want to use grep -v , then do a word count, for example:

 grep -v dbsnp */* | wc

but this could be due to two errors:

Argument list too long
If it weren’t too long, it would give wc for all the files at once, and not for the file.

So, to repeat, I would love to do this:

 grep -v dbsnp */* wc > Outfile.txt awk '{print $4,$1} Outfile.txt > Outfile.summary.txt

and return it as shown above.

Is there a very easy way to do this? Or am I looking at the loop at least? Again, I know 101 ways to do this the same way we all do, using 4-10 lines of script, but I would like for you to just enter 2 single lines in the command line ... and my knowledge about the shell is not deep enough to find out what methods will allow me to ask the OS.

EDIT -

A solution was proposed:

 find -exec grep -v dbsnp {} \; | xargs -n 1 wc

This solution leads to the following conclusion:

 wc: 1|0:53458644:AMBIGUOUS:CCAGGGC|-16&GCCAGGGCCAGGGC|-18&GCCAGGGCC|-19&GGCCAGGGC|-19&GCCAGGGCG|-19,.:48:48,48:4,4:0,17:-48,0,-48:0,0,-17:27:3,24:24: No such file or directory wc: 10: No such file or directory wc: 53460829: No such file or directory wc: .: Is a directory 0 0 0 . wc: AA: No such file or directory wc: CT: No such file or directory wc: .: Is a directory 0 0 0 . wc: .: Is a directory 0 0 0 .

As far as I can tell, it seems to treat each line as a file. I am still looking at other answers and thank you for your help.

+6

bash grep wc

Vincent laufer Jun 05 '14 at 6:15

source share

4 answers

You have too many matches with */* , so grep gets a list of long arguments. You can use find to get around this:

 find -exec grep -v dbsnp {} \; | wc

and you might also want to get rid of possible workaround errors:

 find -exec grep -v dbsnp {} \; 2> /dev/null | wc

+2

perreal Jun 05 '14 at 6:20

source share

This works for me:

 grep -or "[a-zA-Z]*" * | cut -d":" -f2 | sort | uniq -c

What you are looking for is the MapReduce algorithm http://en.wikipedia.org/wiki/MapReduce

0

nervosol Jun 05 '14 at 6:39

source share

Based on the primary answer:

If you want a wc file by file, you can use xargs :

 find -exec grep -v dbsnp {} \; | xargs -n 1 wc

xargs can read standard input and build and execute command lines with it. Thus, it reads the result of your input stream and executes wc for each individual element ( -n 1 ).

0

Stefan winkler Jun 05 '14 at 6:55

source share

Pradyjord · Accepted Answer · 2014-06-05T07:24:01+0000

You mentioned that "this does not solve the problem of returning wc separately"

The following will be:

 find -exec wc {} \;

But it will not be with your grep filter "grep -v"

If you intend to do the same as stated in my comment on this answer, please check if the following works:

 find -exec bash -c "echo -n {}; grep -v dbsnp {} | wc " \;

Easily count words in a list of files in a folder after grep -v

More articles: