Using grep in python

There is a file (query.txt) that contains several keywords / phrases that need to be matched with other files using grep. The last three lines of the following code work fine, but when the same command is used inside the while loop, it goes into an infinite loop or something (i.e. does not respond).

import os f=open('query.txt','r') b=f.readline() while b: cmd='grep %s my2.txt'%b #my2 is the file in which we are looking for b os.system(cmd) b=f.readline() f.close() a='He is' cmd='grep %s my2.txt'%a os.system(cmd) 
+4
source share
4 answers

First of all, you are not iterating over the file properly. You can simply use for b in f: without .readline() .

Then your code will hit you in the face as soon as the file name contains any characters that have special meaning in the shell. Use subprocess.call instead of os.system() and pass a list of arguments.

Here's the fixed version:

 import os import subprocess with open('query.txt', 'r') as f: for line in f: line = line.rstrip() # remove trailing whitespace such as '\n' subprocess.call(['/bin/grep', line, 'my2.txt']) 

However, you can improve your code even further by not calling grep at all. Read my2.txt for the string, and then use the re module to do the search. If you don't need a regex at all, you can simply use if line in my2_content

+5
source

Your code scans the entire my2.txt file for each request in query.txt .

Do you want to:

  • read all requests in a list
  • repeats over all lines of a text file and checks each file for all requests.

Try this code:

 with open('query.txt','r') as f: queries = [l.strip() for l in f] with open('my2.txt','r') as f: for line in f: for query in queries: if query in line: print query, line 
+3
source

This is actually not a good way to use Python, but if you need to do something like this, do it right:

 from __future__ import with_statement import subprocess def grep_lines(filename, query_filename): with open(query_filename, "rb") as myfile: for line in myfile: subprocess.call(["/bin/grep", line.strip(), filename]) grep_lines("my2.txt", "query.txt") 

And I hope that your file does not contain characters that have special meanings in regular expressions =)

Alternatively, you can only do this with grep :

 grep -f query.txt my2.txt 

It works as follows:

 ~ $ cat my2.txt One two two two two three ~ $ cat query.txt two two three ~ $ python bar.py two two two three 
0
source
 $ grep -wFf query.txt my2.txt > out.txt 

this will match all keywords in the query.txt file with my2.txt file and save the result in out.txt

Read man grep for a description of all possible arguments.

0
source

Source: https://habr.com/ru/post/1393159/


All Articles