Use regex to handle nested brackets in math equation?

If I have:

statement = "(2*(3+1))*2" 

I want to be able to handle multiple parentheses in parentheses for the math reader I'm writing. Perhaps I did wrong, but my goal was to go deep into the parentheses recursively while they were gone, and then I will do the math. So, I would like to first focus on

 "(2*(3+1))" 

then focus on

 "(3+1)" 

I was hoping to do this by assigning the focus value to the starting regular expression index and the ending regular expression index. I have yet to figure out how to find the ending index, but I'm more interested in the first regular expression match

 r"\(.+\)" 

failed to combine. I wanted him to read as "any one or more characters contained in a set of parentheses." Can someone explain why the above expression does not match the above in python?

+6
source share
3 answers

I love regular expressions. I use them all the time.

Do not use regular expressions for this.

You need an actual parser that will actually parse your math expressions. You can read this:

http://effbot.org/zone/simple-top-down-parsing.htm

After you have actually analyzed the expression, it is trivial to go through the parse tree and calculate the result.

EDIT: @Lattyware suggested pyparsing, which should also be a good way, and could be simpler than the EFFBot solution posted above.

http://pyparsing.wikispaces.com

Here is a direct link to an example of a pyration code for an evaluator with four algebraic expression functions:

http://pyparsing.wikispaces.com/file/view/fourFn.py

+12
source

what it is worth, there is a bit more context here:

Regular expressions

are called “regular” because they are associated with regular grammars, and regular grammars cannot describe (unlimited) nested parentheses (they can describe a bunch of random parentheses, but they cannot make them match in neat pairs).

One way to understand this is to understand that regular expressions can (modulo some details, which I will explain at the end) be transformed into deterministic finite state machines. which sounds intimidating, but really just means that they can be converted to lists of "rules", where the rules depend on what you are matching and describe what you can match.

for example, the regular expression ab*c can be converted to:

  • at the beginning, you can only match a . then go to 2.

  • now you can match b and go back to 2 or combine c and go to 3

  • all is ready! the match was successful!

and this is a “deterministic finite state machine”.

In any case, the interesting part of this is that if you sit down and try to do something similar to match pairs of parentheses, you cannot! give it a try. you can match the final number by creating more and more rules, but you cannot write a general set of rules that correspond to an unlimited number of parentheses (I must add that the rules should be in the form "if you agree with X go to Y").

now obviously you can change this in various ways. you can allow more complex rules (for example, extend them so you can count the brackets), and then you could get something that worked as you expect. but it will not be regular grammar.

given that regular expressions are limited in this way, why are they used and not something more complicated? it turns out that they are something sweet - they can do a lot, while remaining quite simple and effective. more complex grammars (kinds of rules) can be more powerful, but they are also more difficult to implement and have more problems with efficiency.

the final disclaimer promised additional details: in practice, many regular expressions these days are actually more powerful than that (and in fact they should not be called "regular expressions"). but the above is still the main explanation of why you should not use regexp for this.

ps jesse suggested a solution to get around this using regexp several times; here the argument is one use of regex.

+2
source

I probably agree with steveha and do not recommend regex for this, but to answer your question in particular, you need unlimited parses to pull out result groups (your template only eluded parens):

 >>> re.match(r"\((.+)\)", "(2*(3+1))*2").group(1) '2*(3+1)' 

If you follow this route, you can iterate over the results of the match until you finish the match, and then cancel the list of results to work inside out.

+1
source

Source: https://habr.com/ru/post/913669/


All Articles