Match target word for python

For each target word, I want to check whether the following words that appear before the next target word correspond to what I set in the document. If a match, I want it to be displayed in true and written to the txt file. If false, write false.

I use regex but the iteration method is great

import re re.findall("([a-zA-Z]+) " + tofind, txt) 

Target and following words:

 target word: document next words: set is complete 

Doc example:

The document that I installed is complete. The document is excellent. Is a document a large set of documents completed. A full document document is good, but not complete.

The document appears 6 times in this passage, but I want it to come back and output below to the txt file

 first document -> true second document -> false third document -> false fourth document -> true fifth document -> false sixth document -> false 
+5
source share
3 answers

Do not use regular expressions for this task; instead, string concatenation will be used. An example of a simple method:

 sampleDoc = "Document that I set is complete now. Document is great set. Is document is great complete document set is complete. Document is complete document is good but not complete.".lower() findWord = "document".lower() wordToFind = "set is complete".lower() splitList = sampleDoc.split(findWord) splitList.pop(0) for position,phrase in enumerate(splitList): if wordToFind in phrase: print("Document Number", str(position+1), "-> true") else: print("Document Number", str(position+1), "-> false") 

We break the text into every word we try to find by sending it to the list. We iterate over this list and, if important words are found, print true, or if not, we print false.

0
source

A regex solution to ensure with word boundaries that words are not part of other words (pre set , now adays, other documents ):

 import re text='Document that I set is complete now. Document is great set. Is document is great complete document set is complete. Document is complete document is good but not complete.' target='document' nextwords='set is complete' spat = re.compile(r'\b{}\b'.format(re.escape(target)), re.I) mpat = re.compile(r'\b{}\b'.format(re.escape(nextwords)), re.I) result = [True if (mpat.search(x)) else False for x in spat.split(text)[1:]] print(result) 

Obviously, if target and nextwords begin and end with characters other than words, you need to replace word boundaries with search queries.

0
source

You can use the star index for all cases document and the final index set is complete using the start and end attributes of matching objects. And get the expected matches by checking if the last index of the next word is between one of the document s constant pairs.

 >>> all_targets_start = [g.start() for g in re.finditer(r'document', s, re.I)] >>> all_nextw_end = [g.end() for g in re.finditer(r'set is complete', s, re.I)] >>> >>> [True if any(i<k<j for k in all_nextw_end) else False for i,j in zip(all_targets_start, all_targets_start[1:])] [True, False, False, True, False] 
0
source

Source: https://habr.com/ru/post/1243253/


All Articles