Indicate how many times each word from the list of words appears in the file?

I have a list.txt file that contains a list of words. I want to check how many times each word appears in another file, file1.txt , and then outputs the results. Simple output of all sufficient numbers, since I can manually add them to list.txt using a spreadsheet program, but if the script adds numbers at the end of each line in list.txt , this is even better, for example:

 bear 3 fish 15 

I tried this, but it does not work:

 cat list.txt | grep -c file1.txt 
+6
source share
4 answers

You can do this in a loop that reads one word at a time from the word list file, and then counts the instances in the data file. For instance:

 while read; do echo -n "$REPLY " fgrep -ow "$REPLY" data.txt | wc -l done < <(sort -u word_list.txt) 

Secret Sauce consists of:

  • using the implicit variable REPLY;
  • by replacing a process for collecting words from a word list file; and
  • ensuring that you are grepping for whole words in the data file.
+8
source

This awk method should only go through each file:

 awk ' # read the words in list.txt NR == FNR {count[$1]=0; next} # process file1.txt { for (i=0; i<=NF; i++) if ($i in count) count[$i]++ } # output the results END { for (word in count) print word, count[word] } ' list.txt file1.txt 
+4
source

This may work for you (GNU sed):

 tr -s ' ' '\n' file1.txt | sort | uniq -c | sed -e '1i\s|.*|& 0|' -e 's/\s*\(\S*\)\s\(\S*\)\s*/s|\\<\2\\>.*|\2 \1|/' | sed -f - list.txt 

Explanation:

  • Divide file1.txt into words
  • Word sorting
  • Count the words
  • Create a sed script to match the words (initially zero from each word)
  • Run the above script with list.txt
+3
source

single line command

 cat file1.txt |tr " " "\n"|sort|uniq -c |sort -n -r -k 1 |grep -w -f list.txt 

The last part of the command tells grep to read words that match the list (-f), and then match the whole words (-w), that is, if the car is in list.txt, grep should ignore the carriage.

However, keep in mind that your view of the whole word and the appearance of grep may differ. eg. although the car will not correspond to the carriage, it will correspond to a car wash, note that a “-” will be considered for the word boundary. grep accepts anything but letters, numbers, and underscores as the word boundary. This should not be a problem, as it corresponds to the accepted definition of the word in English.

+1
source

Source: https://habr.com/ru/post/916145/


All Articles