Context in pyparsing parsing actions other than global

I would like to parse two (or any number) of expressions, each of which has its own set of variable definitions or a different context.

There seems to be no obvious way to associate a context with a specific call to pyparsing.ParseExpression.parseString() . The most natural way is to use the instance method of a certain class as parsing actions. The problem with this approach is that the grammar needs to be redefined for each parsing context (for example, in the __init__ class), which seems terribly inefficient.

Using pyparsing.ParseExpression.copy() rules does not help; individual expressions get cloned in order, but the subexpressions of which they are composed are not updated in any obvious way, and therefore none of the parsing actions of any nested expression is triggered.

The only other way I can come up with this effect is to define a grammar that returns an abstract parsing tree without context, and then processes it in the second step. This seems inconvenient even for simple grammars: it would be nice to just throw an exception when an unrecognized name is used, and it still won’t parse languages ​​like C, which actually require a context about what came before to Find out which compliance is agreed.

Is there any other way to insert context (without using a global variable, of course) into parsing pirage expression actions?

+4
source share
4 answers

I do not know if this will necessarily answer your question, but this is one of the approaches to setting up the parser in context:

 from pyparsing import Word, alphas, alphanums, nums, oneOf, ParseFatalException var = Word(alphas+'_', alphanums+'_').setName("identifier") integer = Word(nums).setName("integer").setParseAction(lambda t:int(t[0])) operand = integer | var operator = oneOf("+ - * /") ops = {'+' : lambda a,b:a+b, '-' : lambda a,b:ab, '*' : lambda a,b:a*b, '/' : lambda a,b:a/b if b else "inf", } binop = operand + operator + operand # add parse action that evaluates the binary operator by passing # the two operands to the appropriate binary function defined in ops binop.setParseAction(lambda t: ops[t[1]](t[0],t[2])) # closure to return a context-specific parse action def make_var_parseAction(context): def pa(s,l,t): varname = t[0] try: return context[varname] except KeyError: raise ParseFatalException("invalid variable '%s'" % varname) return pa def eval_binop(e, **kwargs): var.setParseAction(make_var_parseAction(kwargs)) try: print binop.parseString(e)[0] except Exception as pe: print pe eval_binop("m*x", m=100, x=12, b=5) eval_binop("z*x", m=100, x=12, b=5) 

Print

 1200 invalid variable 'z' (at char 0), (line:1, col:1) 
+3
source

A bit late, but googling pyparsing reentrancy shows this topic, so my answer is.
I solved the problem with reusing / reusing an instance of the parser by attaching a context to the parsed string. You are a subclass of str , put your context in the attribute of the new class str, pass its pyparsing instance pyparsing and pyparsing context back into action.

Python 2.7:

 from pyparsing import LineStart, LineEnd, Word, alphas, Optional, Regex, Keyword, OneOrMore # subclass str; note that unicode is not handled class SpecStr(str): context = None # will be set in spec_string() below # override as pyparsing calls str.expandtabs by default def expandtabs(self, tabs=8): ret = type(self)(super(SpecStr, self).expandtabs(tabs)) ret.context = self.context return ret # set context here rather than in the constructor # to avoid messing with str.__new__ and super() def spec_string(s, context): ret = SpecStr(s) ret.context = context return ret class Actor(object): def __init__(self): self.namespace = {} def pair_parsed(self, instring, loc, tok): self.namespace[tok.key] = tok.value def include_parsed(self, instring, loc, tok): # doc = open(tok.filename.strip()).read() # would use this line in real life doc = included_doc # included_doc is defined below parse(doc, self) # <<<<< recursion def make_parser(actor_type): def make_action(fun): # expects fun to be an unbound method of Actor def action(instring, loc, tok): if isinstance(instring, SpecStr): return fun(instring.context, instring, loc, tok) return None # None as a result of parse actions means # the tokens has not been changed return action # Sample grammar: a sequence of lines, # each line is either 'key=value' pair or '#include filename' Ident = Word(alphas) RestOfLine = Regex('.*') Pair = (Ident('key') + '=' + RestOfLine('value')).setParseAction(make_action(actor_type.pair_parsed)) Include = (Keyword('#include') + RestOfLine('filename')).setParseAction(make_action(actor_type.include_parsed)) Line = (LineStart() + Optional(Pair | Include) + LineEnd()) Document = OneOrMore(Line) return Document Parser = make_parser(Actor) def parse(instring, actor=None): if actor is not None: instring = spec_string(instring, actor) return Parser.parseString(instring) included_doc = 'parrot=dead' main_doc = """\ #include included_doc ham = None spam = ham""" # parsing without context is ok print 'parsed data:', parse(main_doc) actor = Actor() parse(main_doc, actor) print 'resulting namespace:', actor.namespace 

gives

 ['#include', 'included_doc', '\n', 'ham', '=', 'None', '\n', 'spam', '=', 'ham'] {'ham': 'None', 'parrot': 'dead', 'spam': 'ham'} 

This approach makes Parser itself completely reusable and repetitive. pyparsing internals are usually reentrant unless you touch the static fields of ParserElement . The only drawback is that pyparsing flushes its packrat cache each time parseString called, but this can be resolved by overriding SpecStr.__hash__ (to make it hashed, for example, object , not str ) and some monkeypatching option. In my dataset, this is not a problem at all, since the performance hit is negligible, and this even contributes to memory usage.

+3
source

Howabout allows syntax actions to be examples, as you say, but just don't restore the class? Instead, if you want to parse another translation unit, reset the context in the same parser object.

Something like that:

 from pyparsing import Keyword, Word, OneOrMore, alphas, nums class Parser: def __init__(self): ident = Word(alphas) identval = Word(alphas).setParseAction(self.identval_act) numlit = Word(nums).setParseAction(self.numlit_act) expr = identval | numlit letstmt = (Keyword("let") + ident + expr).setParseAction(self.letstmt_act) printstmt = (Keyword("print") + expr).setParseAction(self.printstmt_act) program = OneOrMore(letstmt | printstmt) self.symtab = {} self.grammar = program def identval_act(self, (ident,)): return self.symtab[ident] def numlit_act(self, (numlit,)): return int(numlit) def letstmt_act(self, (_, ident, val)): self.symtab[ident] = val def printstmt_act(self, (_, expr)): print expr def reset(self): self.symtab = {} def parse(self, s): self.grammar.parseString(s) P = Parser() P.parse("""let foo 10 print foo let bar foo print bar """) print P.symtab P.parse("print foo") # context is kept. P.reset() P.parse("print foo") # but here it is reset and this fails 

In this example, "symtab" is your context.

Because of this, it doesn’t work well if you are trying to do parallel parsing in different threads, but I don’t see how this can work reasonably with parsing actions together.

+1
source

I came across this exact restriction and used threading.local () to give the parser contextual information as a local thread store. In my case, I keep a stack of parsed terms that pop up and appear inside parsing action functions, but obviously, you can also use it to store a reference to an instance of the class or something else.

It looks something like this:

 import threading __tls = threading.local() def parse_term(t): __tls.stack.append(convert_term(t)) def parse_concatenation(t): rhs = __tls.stack.pop() lhs = __tls.stack.pop() __tls.stack.append(convert_concatenation(t, lhs, rhs) # parse a string s using grammar EXPR, that has parse actions parse_term and # parse_concatenation for the rules that parse expression terms and concatenations def parse(s): __tls.stack = [] parse_result = EXPR.parseString(s) return __tls.stack.pop() 

In my case, all the materials of the local thread store, stack setup, parsing actions and the grammar itself are pushed out of the open API, so no one can see what is happening or the mess from the side. There's just a parse method somewhere in the API that takes a string and returns an parsed, transformed representation of the request, which is thread safe and does not require re-creating the grammar for each parsing session.

+1
source

Source: https://habr.com/ru/post/1388828/


All Articles