How to find a sentence containing a phrase in text using python re?

I have a text that is a sentence, some of which are questions. I am trying to create a regular expression that will only retrieve questions that contain a specific phrase, namely "NSF":

import re
s = "This is a string. Is this a question? This isn't a question about NSF. Is this one about NSF? This one is a question about NSF but is it longer?"

Ideally, re.findall will return:

['Is this one about NSF?','This one is a question about NSF but is it longer?']

but my best attempt:

re.findall('([\.\?].*?NSF.*\?)+?',s)
[". Is this a question? This isn't a question about NSF. Is this one about NSF? This one is a question about NSF but is it longer?"]

I know that I need to do something with the non-greedy, but I'm not sure where I got confused.

+4
source share
1 answer

: , , , OP, . nltk (. this ).

, , , , , , , , , , , . , .

\s*([^!.?]*?NSF[^!.?]*?[?])

regex.

  • \s* - 0+
  • ([^!.?]*?NSF[^.?]*?[?]) - 1
    • [^!.?]*? - 0+, ., ! ?,
    • NSF - , , NSF
    • [^.?]*? - .
    • [?] - ? ( \?)
+1

Source: https://habr.com/ru/post/1658022/


All Articles