Nested dictionary data from pyparsing

Question

Nested dictionary data from pyparsing

I use pyparsing to parse a form expression:

"and(or(eq(x,1), eq(x,2)), eq(y,3))"

My test code is as follows:

 from pyparsing import Word, alphanums, Literal, Forward, Suppress, ZeroOrMore, CaselessLiteral, Group field = Word(alphanums) value = Word(alphanums) eq_ = CaselessLiteral('eq') + Group(Suppress('(') + field + Literal(',').suppress() + value + Suppress(')')) ne_ = CaselessLiteral('ne') + Group(Suppress('(') + field + Literal(',').suppress() + value + Suppress(')')) function = ( eq_ | ne_ ) arg = Forward() and_ = Forward() or_ = Forward() arg << (and_ | or_ | function) + Suppress(",") + (and_ | or_ | function) + ZeroOrMore(Suppress(",") + (and_ | function)) and_ << Literal("and") + Suppress("(") + Group(arg) + Suppress(")") or_ << Literal("or") + Suppress("(") + Group(arg) + Suppress(")") exp = (and_ | or_ | function) print(exp.parseString("and(or(eq(x,1), eq(x,2)), eq(y,3))"))

I have an output in the form:

 ['and', ['or', ['eq', ['x', '1'], 'eq', ['x', '2']], 'eq', ['y', '3']]]

List output looks ok. But for further processing, I would like to have an output in the form of a nested dictionary:

 { name: 'and', args: [ { name: 'or', args: [ { name: 'eq', args: ['x','1'] }, { name: 'eq', args: ['x','2'] } ] }, { name: 'eq', args: ['y','3'] } ] }

I tried the Dict class, but without success.

Can this be done in pyparsing? Or should I manually format the list output?

+5

python s-expression pyparsing

Horned owl Aug 11 '14 at 8:21

source share

2 answers

I don't think pyparsing has something like this, but you can recursively create data structures:

 def toDict(lst): if not isinstance(lst[1], list): return lst return [{'name': name, 'args': toDict(args)} for name, args in zip(lst[::2], lst[1::2])]

Your example behaves differently in the number of args children. If this is only one, you simply use a dict , otherwise it is a list of dicts. This will lead to difficult use. It is better to use a list of dictons, even if there is one child. That way, you always know how to iterate over children without type checking.

Example

We can use json.dumps to print the output (note that here we print parsedict[0] , because we know that the root has one child, but we always return the lists as indicated above):

 import json parsed = ['and', ['or', ['eq', ['x', '1'], 'eq', ['x', '2']], 'eq', ['y', '3']]] parsedict = toDict(parsed) print json.dumps(parsedict[0], indent=4, separators=(',', ': '))

Exit

 { "name": "and", "args": [ { "name": "or", "args": [ { "name": "eq", "args": [ "x", "1" ] }, { "name": "eq", "args": [ "x", "2" ] } ] }, { "name": "eq", "args": [ "y", "3" ] } ] }

To get this output, I replaced the dict with collections.OrderedDict in the toDict functions, just to save the name until args .

+2

enrico.bacis Aug 11 '14 at 8:42

source share

Paulmcg · Accepted Answer · 2014-08-11T10:55:06+0000

The function you are looking for is important in pyparsing - setting up result names. The use of result names is recommended for most application programs. This feature exists since version 0.9, since

 expr.setResultsName("abc")

This allows me to access this particular field of general parsed results like res["abc"] or res.abc (where res is the value returned from parser.parseString ). You can also call res.dump() to see a nested representation of your results.

However, while parsers are easy to follow right away, I added support for this form of setResultsName in 1.4.6:

 expr("abc")

Here is your parser with a little cleanup, and the result names are added:

 COMMA,LPAR,RPAR = map(Suppress,",()") field = Word(alphanums) value = Word(alphanums) eq_ = CaselessLiteral('eq')("name") + Group(LPAR + field + COMMA + value + RPAR)("args") ne_ = CaselessLiteral('ne')("name") + Group(LPAR + field + COMMA + value + RPAR)("args") function = ( eq_ | ne_ ) arg = Forward() and_ = Forward() or_ = Forward() exp = Group(and_ | or_ | function) arg << delimitedList(exp) and_ << Literal("and")("name") + LPAR + Group(arg)("args") + RPAR or_ << Literal("or")("name") + LPAR + Group(arg)("args") + RPAR

Unfortunately, dump () only handles nesting of results, not lists of values, so it's not as good as json.dumps (maybe this would be a good improvement for dumping?). So, here is a custom method for drop-down results of nested names: args:

 ob = exp.parseString("and(or(eq(x,1), eq(x,2)), eq(y,3))")[0] INDENT_SPACES = ' ' def dumpExpr(ob, level=0): indent = level * INDENT_SPACES print (indent + '{') print ("%s%s: %r," % (indent+INDENT_SPACES, 'name', ob['name'])) if ob.name in ('eq','ne'): print ("%s%s: %s" % (indent+INDENT_SPACES, 'args', ob.args.asList())) else: print ("%s%s: [" % (indent+INDENT_SPACES, 'args')) for arg in ob.args: dumpExpr(arg, level+2) print ("%s]" % (indent+INDENT_SPACES)) print (indent + '}' + (',' if level > 0 else '')) dumpExpr(ob)

Donation:

 { name: 'and', args: [ { name: 'or', args: [ { name: 'eq', args: ['x', '1'] }, { name: 'eq', args: ['x', '2'] }, ] }, { name: 'eq', args: ['y', '3'] }, ] }

Nested dictionary data from pyparsing

Example

More articles: