Indicate how many times each word from the list of words appears in the file?

Question

Indicate how many times each word from the list of words appears in the file?

I have a list.txt file that contains a list of words. I want to check how many times each word appears in another file, file1.txt , and then outputs the results. Simple output of all sufficient numbers, since I can manually add them to list.txt using a spreadsheet program, but if the script adds numbers at the end of each line in list.txt , this is even better, for example:

 bear 3 fish 15

I tried this, but it does not work:

 cat list.txt | grep -c file1.txt

+6

bash grep

Village May 19 '12 at 5:41

source share

4 answers

This awk method should only go through each file:

 awk ' # read the words in list.txt NR == FNR {count[$1]=0; next} # process file1.txt { for (i=0; i<=NF; i++) if ($i in count) count[$i]++ } # output the results END { for (word in count) print word, count[word] } ' list.txt file1.txt

+4

glenn jackman May 19 '12 at 9:44

source share

This may work for you (GNU sed):

 tr -s ' ' '\n' file1.txt | sort | uniq -c | sed -e '1i\s|.*|& 0|' -e 's/\s*\(\S*\)\s\(\S*\)\s*/s|\\<\2\\>.*|\2 \1|/' | sed -f - list.txt

Explanation:

Divide file1.txt into words
Word sorting
Count the words
Create a sed script to match the words (initially zero from each word)
Run the above script with list.txt

+3

potong May 19 '12 at 8:26

source share

single line command

 cat file1.txt |tr " " "\n"|sort|uniq -c |sort -n -r -k 1 |grep -w -f list.txt

The last part of the command tells grep to read words that match the list (-f), and then match the whole words (-w), that is, if the car is in list.txt, grep should ignore the carriage.

However, keep in mind that your view of the whole word and the appearance of grep may differ. eg. although the car will not correspond to the carriage, it will correspond to a car wash, note that a “-” will be considered for the word boundary. grep accepts anything but letters, numbers, and underscores as the word boundary. This should not be a problem, as it corresponds to the accepted definition of the word in English.

+1

Sahil singh Sep 11 '14 at 14:21

source share

Todd A. Jacobs · Accepted Answer · 2012-05-19T06:01:04+0000

You can do this in a loop that reads one word at a time from the word list file, and then counts the instances in the data file. For instance:

 while read; do echo -n "$REPLY " fgrep -ow "$REPLY" data.txt | wc -l done < <(sort -u word_list.txt)

Secret Sauce consists of:

using the implicit variable REPLY;
by replacing a process for collecting words from a word list file; and
ensuring that you are grepping for whole words in the data file.

Indicate how many times each word from the list of words appears in the file?

More articles: