Extract words from file

I open the file with python to find out if a predefined set of words is present in the opened file or not. I took a predefined set of words in the list and opened a file that should be tested. Now there is a way to extract words in python, and not into strings. It makes my job easier.

+3
source share
3 answers
import re

def get_words_from_string(s):
    return set(re.findall(re.compile('\w+'), s.lower()))

def get_words_from_file(fname):
    with open(fname, 'rb') as inf:
        return get_words_from_string(inf.read())

def all_words(needle, haystack):
    return set(needle).issubset(set(haystack))

def any_words(needle, haystack):
    return set(needle).intersection(set(haystack))

search_words = get_words_from_string("This is my test")
find_in = get_words_from_string("If this were my test, I is passing")

print any_words(search_words, find_in)

print all_words(search_words, find_in)

returns

set(['this', 'test', 'is', 'my'])
True
+7
source

You can do a few things

  • Call the .readlines () file and split all the text into the desired separator if your text is small.
  • Call read () and do it bytes at a time

Check pydocs for file - http://docs.python.org/release/2.5.2/lib/bltin-file-objects.html

+1
source

, , , , , . .

words = set(['hello', 'world', 'testing'])
f     = open('testfile.txt', 'rb')
data  = set(f.read().split())
print words.intersection(data)
+1

Source: https://habr.com/ru/post/1791343/


All Articles