Using a regexp, extract quoted strings that may contain nested quotes

I have the following line:

'Well, I've tried to say "How Doth the Little Busy Bee," but it all came different!' Alice replied in a very melancholy voice. She continued, 'I'll try again.'

Now I want to extract the following quotes:

1. Well, I've tried to say "How Doth the Little Busy Bee," but it all came different!
2. How Doth the Little Busy Bee,
3. I'll try again.

I tried the following code, but I am not getting what I want. [^\1]*not working as expected. Or is the problem elsewhere?

import re

s = "'Well, I've tried to say \"How Doth the Little Busy Bee,\" but it all came different!' Alice replied in a very melancholy voice. She continued, 'I'll try again.'"

for i, m in enumerate(re.finditer(r'([\'"])(?!(?:ve|m|re|s|t|d|ll))(?=([^\1]*)\1)', s)):
    print("\nGroup {:d}: ".format(i+1))
    for g in m.groups():
        print('  '+g)
+4
source share
4 answers

If you really need to return all the results from a single regular expression that is applied only once, you will need to use lookahead ( (?=findme)) so that the search position returns to the beginning after each match - see this answer for a more detailed explanation.

, , , . I've . , , , :

  • (, ). , , A" , ," .
  • (, ). , , 'B , '. .

:

(?=(?:(?<!\w)'(\w.*?)'(?!\w)|"(\w.*?)"(?!\w)))

Regular expression visualization

Debuggex

, . : https://regex101.com/r/vX4cL9/1

+3

, :

(?=(?<!\w|[!?.])('|\")(?!\s)(?P<content>(?:.(?!(?<=(?=\1).)(?!\w)))*)\1(?!\w))

DEMO

, ([!?.]) . .

content . , , .. - , , .

  • (?=(?<!\w|[!?.])('|\")(?!\s) - ' ", ((?<!\w|[!?.])), ((?!\s)), ' " 1 ,
  • (?P<content>(?:.(?!(?<=(?=\1).)(?!\w)))*)\1(?!\w)) - , char (' ", 1), ,

, , lookaround, - .

:

, [^\1]* char, , 1, , \1 char ( , , ), . - . regex.

, , lookaround, - : (')((?:.(?!\1))*.) - char, char, char, char, char - .

+2

Python, , , re . Python Matthew Barnett stellar regex, Perl, PCRE .NET.

, , re, regex, . , , regex , , Perl PCRE.

, , ( , ). : , . .

import regex

quote = regex.compile(r'''(?x)
(?(DEFINE)
(?<qmark>["']) # what we'll consider a quotation mark
(?<not_qmark>[^'"]+) # chunk without quotes
(?<a_quote>(?P<qopen>(?&qmark))(?&not_qmark)(?P=qopen)) # a non-nested quote
) # End DEFINE block

# Start Match block
(?&a_quote)
|
(?P<open>(?&qmark))
  (?&not_qmark)?
  (?P<quote>(?&a_quote))
  (?&not_qmark)?
(?P=open)
''')

str = """'Well, I have tried to say "How Doth the Little Busy Bee," but it all came different!' Alice replied in a very melancholy voice. She continued, 'I will try again.'"""

for match in quote.finditer(str):
    print(match.group())
    if match.group('quote'):
        print(match.group('quote'))

'Well, I have tried to say "How Doth the Little Busy Bee," but it all came different!'
"How Doth the Little Busy Bee,"
'I will try again.'

-, , , I'll I will, . I'll , .

(?(DEFINE)...) qmark, not_qmark a_quote, , .

:

  • (?&a_quote) ,
  • | ...
  • (?P<open>(?&qmark)) open,
  • (?¬_qmark)? , ,
  • (?P<quote>(?&a_quote)) quote,
  • (?¬_qmark)? , ,
  • (?P=open) , .

Python quote, .

? . (?(DEFINE)...) , , .

, .

, , , . , 1, (?1). something, (?&something). , (?) .

+1

, juste one regex pass, :

import re

REGEX = re.compile(r"(['\"])(.*?[!.,])\1", re.S)

S = """'Well, I've tried to say "How Doth the Little Busy Bee," but it all came different!' Alice replied in a very melancholy voice. She continued, 'I'll try again.' 'And we may now add "some more 'random test text'.":' "Yes it seems to be a good idea!" 'ok, let go.'"""


def extract_quotes(string, quotes_list=None):
    list = quotes_list or []
    list += [found[1] for found in REGEX.findall(string)]
    print("found: {}".format(quotes_list))
    index = 0
    for quote in list[:]:
        index += 1
        sub_list = extract_quotes(quote)
        list = list[:index] + sub_list + list[index:]
        index += len(sub_list)
    return list


print extract_quotes(S)

:

['Well, I\'ve tried to say "How Doth the Little Busy Bee," but it all came different!', 'How Doth the Little Busy Bee,', "I'll try again.", 'And we may now add "some more \'random test text\'.":\' "Yes it seems to be a good idea!" \'ok, let\ go.', "some more 'random test text'.", 'Yes it seems to be a good idea!']

, , , " ". , , . " " , " ". .

, , . Thue extract_quotes quotes_list. , ...

0

Source: https://habr.com/ru/post/1655468/


All Articles