Extract sting inside nested brackets and its pre / postfix Python

I am trying to extract a string inside nested brackets and process them.

Say I have a line

string = "(A((B|C)D|E|F))"

According to the answer in Extract string inside nested brackets

I can extract the string inside the enclosed brackets, but for my case it is different since I have "D"brackets at the end, so this is the result of the code. He looks so far from my desired exit

['B|C', 'D|E|F', 'A']

This is my desired result.

[[['A'],['B|C'],['D']], [['A'],['E|F']']]     # '|' means OR

Do you have any recommendation whether to implement it with a regular expression or just run the entire specified string?

Thus, this can lead to my final result, i.e.

"ABD"
"ACD"
"AE"
"AF"

At this point I will use itertools.product

+4
2

, , . . (, , python , , , , ).

- . :

EXPR -> A EXPR (an expression is an expression preceded by an alphabetic character)
EXPR -> (LIST) EXPR (an expression is a list followed by an expression)
EXPR -> "" (an expression can be an empty string)

LIST -> EXPR | LIST (a list is an expression followed by "|" followed by a list)
LIST -> EXPR (or just one expression)

, . :

class Parser:

    def __init__(self, data):
        self.data = data
        self.pos = 0

    def get_cur_char(self):
        """
        Returns the current character or None if the input is over
        """
        return None if self.pos == len(self.data) else self.data[self.pos]

    def advance(self):
        """
        Moves to the next character of the input if the input is not over.
        """
        if self.pos < len(self.data):
            self.pos += 1

    def get_and_advance(self):
        """
        Returns the current character and moves to the next one.
        """
        res = self.get_cur_char()
        self.advance()
        return res

    def parse_expr(self):
        """
        Parse the EXPR according to the speficied grammar.
        """
        cur_char = self.get_cur_char()
        if cur_char == '(':
            # EXPR -> (LIST) EXPR rule
            self.advance()
            # Parser the list and the rest of the expression and combines
            # the result.
            prefixes = self.parse_list()
            suffices = self.parse_expr()
            return [p + s for p in prefixes for s in suffices]
        elif not cur_char or cur_char == ')' or cur_char == '|':
            # EXPR -> Empty rule. Returns a list with an empty string without
            # consuming the input.
            return ['']
        else:
            # EXPR -> A EXPR rule.
            # Parses the rest of the expression and prepends the current 
            # character.
            self.advance()
            return [cur_char + s for s in self.parse_expr()]

    def parse_list(self):
        """
        Parser the LIST according to the speficied grammar.
        """
        first_expr = self.parse_expr()
        # Uses the LIST -> EXPR | LIST rule if the next character is | and
        # LIST -> EXPR otherwise    
        return first_expr + (self.parse_list() if self.get_and_advance() == '|' else [])


if __name__ == '__main__':
    string = "(A((B|C)D|E|F))"
    parser = Parser(string)
    print('\n'.join(parser.parse_expr()))

, .

(, ), .

+2

, . , , :

input: "(A((B|C)D|E|F))"
output: ['ABD', 'ACD', 'AE', 'AF']

, :

import re

def tokenize(text):
    return re.findall(r'[()|]|\w+', text)

def product(a, b):
    return [x+y for x in a for y in b] if a and b else a or b

def parse(text):
    tokens = tokenize(text)

    def recurse(tokens, i):
        factor = []
        term = []
        while i < len(tokens) and tokens[i] != ')':
            token = tokens[i]
            i += 1
            if token == '|':
                term.extend(factor)
                factor = []
            else:
                if token == '(':
                    expr, i = recurse(tokens, i)
                else:
                    expr = [token]
                factor = product(factor, expr)
        return term+factor, i+1

    return recurse(tokens, 0)[0]

string = "(A((B|C)D|E|F))"

print(parse(string))

, repl.it

+1

Source: https://habr.com/ru/post/1673880/


All Articles