I have a string buffer of a huge text file. I have to search for specific words / phrases in the string buffer. What is an effective way to do this?
I tried using re-modules. But since I have a huge text box that I have to look for. It takes a lot of time.
Given a dictionary of words and phrases.
I iterate over each file, read it in a line, look through all the words and phrases in the dictionary and increase the number in the dictionary if the keys are found.
One small optimization, in our opinion, was to sort the dictionary of phrases / words with the maximum number of words to the smallest. And then compare each starting position of the word from the string buffer and compare the list of words. If one phrase is found, we are not looking for other phrases (since it matches the longest phrase we want)
Can someone tell me how to follow a word in a lowercase buffer. (Iterate string buffer word by word)?
Also, is there any other optimization that can be done on this?
data = str(file_content)
for j in dictionary_entity.keys():
cnt = data.count(j+" ")
if cnt != -1:
dictionary_entity[j] = dictionary_entity[j] + cnt
f.close()
source
share