How to match a string in a sentence

I want to check if a particular string is present in a sentence. I use simple code for this

subStr = 'joker' Sent = 'Hello World I am Joker' if subStr.lower() in Sent.lower(): print('found') 

This is a simple approach, but it does not work when a sentence appears as

hello world I'm joyer

hello world I'm Oker

When I parse a sentence from a PDF file, unnecessary spaces appear here and there.

A simple approach to solving this problem would be to remove all spaces from the sentence and search for matching substrings. I want to know the thoughts of other peoples on this issue, should I take this approach or look for other alternatives.

+5
source share
4 answers

This is more efficient than replace for small strings, more expensive for large strings. He will not deal with controversial cases, for example. 'to day' versus 'today'.

 subStr in ''.join(Sent.split()).lower() # True 
+2
source

you can use regex:

 import re word_pattern = re.compile(r'j\s*o\s*k\s*e\s*r', re.I) sent = 'Hello World I am Joker' if word_pattern.search(sent): print('found') 

I hope this works

+2
source

Try it. It may break unexpectedly. But for your use case, this may work.

 In [1]: Sent = 'Hello World I am Joker' In [3]: subStr = 'Joker' In [4]: if subStr in Sent.replace(' ', ''): ...: print("Do something") ...: Do something 
0
source

Your proposed approach - removing spaces - seems simple and effective (two to ten times faster than other suggestions in some simple tests). If you need to minimize false positives, however, you might be better off with a regex approach. You can add word boundaries to avoid partial word matches, and examine the matching substring to see if any spaces can be real spaces, possibly by matching with a canonical list of words.

 >>> sentence = 'Were the fields ever green? - they were never green.' >>> target = 'evergreen' >>> pattern = re.compile(r'\b' + '\s*'.join(target) + r'\b') >>> pattern.findall(sentence) # only one match because of \b ['ever green'] >>> matching_words = pattern.findall(sentence)[0].split() >>> all(word in dictionary for word in matching_words) True 
0
source

Source: https://habr.com/ru/post/1275197/


All Articles