Parsing text like TCL

I have a configuration text that looks like this:

text=""" key1 value1 key2 { value1 value2 } key3 subkey1 { key1 1 key2 2 key3 { value1 } } BLOBKEY name { dont { # comment parse { me } } } key3 subkey2 { key1 value1 } """ 

Values ​​are simple strings or quotation marks. Keys are just alphanumeric strings. I know in advance that key2 and key3.subkey1.key4 will contain sets, so I can interpret these paths differently. Similarly, I know that BLOBKEY will contain a "escaped" section.

The goal is to convert it to a dictionary that looks like this:

 {'key1': 'value1', 'key2': set(['value1', 'value2']), 'key3': { 'subkey1': { 'key1': 1, 'key2': 2, 'key3': set(['value1']), }, 'subkey2': { 'key1': 'value1' } }, 'BLOBKEY': { 'name': " dont {\n # comment\n parse { me }\n }\n" } } 

This code below does a pretty good job breaking it into a bunch of nested lists.

 import pyparsing string = pyparsing.CharsNotIn("{} \t\r\n") group = pyparsing.Forward() group << ( pyparsing.Group(pyparsing.Literal("{").suppress() + pyparsing.ZeroOrMore(group) + pyparsing.Literal("}").suppress()) | string ) toplevel = pyparsing.OneOrMore(group) 

What is the best way to get the result I want in Python using pyparsing?

+4
source share
1 answer

Here is my progress so far. He does not analyze the raw drops, but everything else seems to be correct.

 LBRA = Literal("{").suppress() RBRA = Literal("}").suppress() EOL = lineEnd.suppress() tmshString = Word(alphanums + '!#$%&()*+,-./:;<=> ?@ [\]^_`|~') tmshValue = Combine( tmshString | dblQuotedString.setParseAction( removeQuotes )) tmshKey = tmshString def toSet(s, loc, t): return set(t[0]) tmshSet = LBRA + Group(ZeroOrMore(tmshValue.setWhitespaceChars(' '))).setParseAction(toSet) + RBRA def toDict(d, l): if not l[0] in d: d[l[0]] = {} for v in l[1:]: if type(v) == list: toDict(d[l[0]],v) else: d[l[0]] = v def trueDefault(s, loc, t): return len(t) and t or True singleKeyValue = Forward() singleKeyValue << ( Group( tmshKey + ( # A toggle value (ie key without value). EOL.setParseAction(trueDefault) | # A set of values on a single line. tmshSet | # A normal value or another singleKeyValue group. Optional(tmshValue | LBRA + ZeroOrMore(singleKeyValue) + RBRA).setParseAction(trueDefault) ) ) ) multiKeysOneValue = Forward() multiKeysOneValue << ( Group( tmshKey + ( multiKeysOneValue | tmshSet | LBRA + ZeroOrMore(singleKeyValue) + RBRA ) ) ) toplevel = OneOrMore(multiKeysOneValue) # now parse data and print results data = toplevel.parseString(testData) h = {} map(lambda x:toDict(h, x), data.asList()) pprint(h) 
+3
source

Source: https://habr.com/ru/post/1346800/


All Articles