Delete all nested blocks, leaving only non-nested blocks through python

Source:

[This] is some text with [some [blocks that are nested [in a [variety] of ways]]]

The resulting text:

[This] is some text with

I don’t think you can make a regex for this by looking at the threads when the stack overflows .

Is there an easy way to do this -> or do I need to achieve for pyparsing (or another parsing library)?

+3
source share
4 answers

Taking the OP example as normative (any block containing additional nested blocks must be deleted), which is relative ...:

import itertools

x = '''[This] is some text with [some [blocks that are nested [in a [variety]
of ways]]] and some [which are not], and [any [with nesting] must go] away.'''

def nonest(txt):
  pieces = []
  d = 0
  level = []
  for c in txt:
    if c == '[': d += 1
    level.append(d)
    if c == ']': d -= 1
  for k, g in itertools.groupby(zip(txt, level), lambda x: x[1]>0):
    block = list(g)
    if max(d for c, d in block) > 1: continue
    pieces.append(''.join(c for c, d in block))
  print ''.join(pieces)

nonest(x)

It emits

[This] is some text with  and some [which are not], and  away.

which, according to the normal hypothesis, would seem to be the desired result.

, level " " (.. ); zip level , groupby, > 0. ( - ), <= 1, . , g block, ( . , ) - , .

+4

, - : , . , "["; , "]".

  • , , , .
  • , .
  • , ; . ( , [ s, , ] s.)
+5

, , pyparsing. .

pyparsing , , .

+3

, expression.transformString(), unsested [] . transformString scanString.

, [] , , "" , :

src = """[This] is some text with [some [blocks that are 
    nested [in a [variety] of ways]] in various places]"""

, - . - , , "" " " . nestedExpr. . :

from pyparsing import nestedExpr, ParseResults, CharsNotIn

# 1. scan the source string for nested [] exprs, and take only those that
# do not themselves contain [] exprs
out = []
last = 0
for tokens,start,end in nestedExpr("[","]").scanString(src):
    out.append(src[last:start])
    if not any(isinstance(tok,ParseResults) for tok in tokens[0]):
        out.append(src[start:end])
    last = end
out.append(src[last:])
print "".join(out)


# 2. scan the source string for nested [] exprs, and take only the toplevel 
# tokens from each
out = []
last = 0
for t,s,e in nestedExpr("[","]").scanString(src):
    out.append(src[last:s])
    topLevel = [tok for tok in t[0] if not isinstance(tok,ParseResults)]
    out.append('['+" ".join(topLevel)+']')
    last = e
out.append(src[last:])
print "".join(out)


# 3. scan the source string for nested [] exprs, and take only the toplevel 
# tokens from each, keeping each group separate
out = []
last = 0
for t,s,e in nestedExpr("[","]", CharsNotIn('[]')).scanString(src):
    out.append(src[last:s])
    for tok in t[0]:
        if isinstance(tok,ParseResults): continue
        out.append('['+tok.strip()+']')
    last = e
out.append(src[last:])
print "".join(out)

:

[This] is some text with 
[This] is some text with [some in various places]
[This] is some text with [some][in various places]

, . , nestedExpr.

+3

Source: https://habr.com/ru/post/1726651/


All Articles