"reading LINE do" and grep problems

I have two files.

file1.txt: Afghans Africans Alaskans ... 

where file2.txt contains the output from wget on the web page, so it is a big messy mess, but it contains a lot of words from the first list.

Bashcript:

 cat file1.txt | while read LINE; do grep $LINE file2.txt; done 

This did not work as expected. I wondered why, so I repeated the $ LINE variable inside the loop and added sleep 1, so I could see what was happening:

 cat file1.txt | while read LINE; do echo $LINE; sleep 1; grep $LINE file2.txt; done 

The result looks in the terminal like this:

afghans
Africans
Alaskans
Albanians
the Americans
grep: Chinese: No such file or directory
: No such file or directory
Arabs
Arabs
Arab / East Indians
: No such file or directory
Argentinans
Armenians
Asian
Asian Indians
: No such file or directory
file2.txt: Asian Naruto
...

So you can see that he finally found the word "Asian". But why does he say:

No such file or directory

?

Is there something strange or am I missing something?

+4
source share
5 answers

@OP, use dos2unix first as recommended. Then use awk

 awk 'FNR==NR{a[$1];next}{ for(i=1;i<=NF;i++){ if($i in a) {print $i} } } ' file1 file2_wget 

Note: using a while and grep loop inside a loop is inefficient, since for each iteration you need to call grep in file2.

@OP, rough explanation: For the values โ€‹โ€‹of FNR and NR, refer to the gawk manual . FNR==NR{a[1];next} means getting the contents of file1 to array a . when FNR is not equal to NR (which means reading the second file now), it checks to see if each word in the file is in array a . If so, print it out. (the for loop is used to repeat each word)

+3
source

What about

 grep -f file1.txt file2.txt 
+5
source

Use more quotes and use less cat

 while IFS= read -r LINE; do grep "$LINE" file2.txt done < file1.txt 
+2
source

Like the citation issue, the uploaded file contains CRLF line endings that throw read off. Use dos2unix to convert the file1.txt file before iterating over it.

+1
source

Although usng awk is faster, grep produces a lot more detail with less effort. So, after the release of dos2unix use:

grep -F -i -n -f <file_containing_pattern> <file_containing_data_blob>

You will have all matches + line numbers (case insensitive)

At a minimum, this will be enough to find all the words from the prep_pattern_file:

 grep -F -f <file_containing_pattern> <file_containing_data_blob> 
+1
source

Source: https://habr.com/ru/post/1347605/


All Articles