"reading LINE do" and grep problems

Question

"reading LINE do" and grep problems

I have two files.

file1.txt: Afghans Africans Alaskans ...

where file2.txt contains the output from wget on the web page, so it is a big messy mess, but it contains a lot of words from the first list.

Bashcript:

 cat file1.txt | while read LINE; do grep $LINE file2.txt; done

This did not work as expected. I wondered why, so I repeated the $ LINE variable inside the loop and added sleep 1, so I could see what was happening:

 cat file1.txt | while read LINE; do echo $LINE; sleep 1; grep $LINE file2.txt; done

The result looks in the terminal like this:

afghans
Africans
Alaskans
Albanians
the Americans
grep: Chinese: No such file or directory
: No such file or directory
Arabs
Arabs
Arab / East Indians
: No such file or directory
Argentinans
Armenians
Asian
Asian Indians
: No such file or directory
file2.txt: Asian Naruto
...

So you can see that he finally found the word "Asian". But why does he say:

No such file or directory

?

Is there something strange or am I missing something?

+4

bash grep while-loop cat

Kevin Apr 11 '11 at 19:22

source share

5 answers

What about

 grep -f file1.txt file2.txt

+5

glenn jackman Apr 11 '11 at 21:54

source share

Use more quotes and use less cat

 while IFS= read -r LINE; do grep "$LINE" file2.txt done < file1.txt

+2

Siegex Apr 11 '11 at 19:25

source share

Like the citation issue, the uploaded file contains CRLF line endings that throw read off. Use dos2unix to convert the file1.txt file before iterating over it.

+1

Ignacio Vazquez-Abrams Apr 11 '11 at 19:36

source share

Although usng awk is faster, grep produces a lot more detail with less effort. So, after the release of dos2unix use:

grep -F -i -n -f <file_containing_pattern> <file_containing_data_blob>

You will have all matches + line numbers (case insensitive)

At a minimum, this will be enough to find all the words from the prep_pattern_file:

 grep -F -f <file_containing_pattern> <file_containing_data_blob>

+1

Sabin Jan 28 '15 at 10:52

source share

kurumi · Accepted Answer · 2011-04-12T00:28:09+0000

@OP, use dos2unix first as recommended. Then use awk

 awk 'FNR==NR{a[$1];next}{ for(i=1;i<=NF;i++){ if($i in a) {print $i} } } ' file1 file2_wget

Note: using a while and grep loop inside a loop is inefficient, since for each iteration you need to call grep in file2.

@OP, rough explanation: For the values of FNR and NR, refer to the gawk manual . FNR==NR{a[1];next} means getting the contents of file1 to array a . when FNR is not equal to NR (which means reading the second file now), it checks to see if each word in the file is in array a . If so, print it out. (the for loop is used to repeat each word)

"reading LINE do" and grep problems

More articles: