Parsing an Existing Configuration File

I have a configuration file that is in the following form:

protocol sample_thread { { AUTOSTART 0 } { BITMAP thread.gif } { COORDS {0 0} } { DATAFORMAT { { TYPE hl7 } { PREPROCS { { ARGS {{}} } { PROCS sample_proc } } } } } } 

A real file may not have these exact fields, and I would prefer not to describe the data structure for the analyzer before it parses.

I searched for other configuration file parsers, but not one of them I found was able to accept the file for this syntax.

I'm looking for a module that can parse such a file, any suggestions?

If anyone is interested, the file in question was created by Quovadx Cloverleaf.

+4
source share
8 answers

pyparsing is pretty handy for quick and easy parsing. A bare minimum would be something like:

 import pyparsing string = pyparsing.CharsNotIn("{} \t\r\n") group = pyparsing.Forward() group << pyparsing.Group(pyparsing.Literal("{").suppress() + pyparsing.ZeroOrMore(group) + pyparsing.Literal("}").suppress()) | string toplevel = pyparsing.OneOrMore(group) 

Use it as:

 >>> toplevel.parseString(text) ['protocol', 'sample_thread', [['AUTOSTART', '0'], ['BITMAP', 'thread.gif'], ['COORDS', ['0', '0']], ['DATAFORMAT', [['TYPE', 'hl7'], ['PREPROCS', [['ARGS', [[]]], ['PROCS', 'sample_proc']]]]]]] 

From there, you can get more complex as you want (scatter numbers separately from strings, search for specific field names, etc.). The above is pretty general, just looking for strings (defined as any characters with no spaces except "{" and "}") and {} separable string lists.

+11
source

Taking Brian's pyparsing decision in another step, you can create a quasi-deserializer for this format using the Dict class:

 import pyparsing string = pyparsing.CharsNotIn("{} \t\r\n") # use Word instead of CharsNotIn, to do whitespace skipping stringchars = pyparsing.printables.replace("{","").replace("}","") string = pyparsing.Word( stringchars ) # define a simple integer, plus auto-converting parse action integer = pyparsing.Word("0123456789").setParseAction(lambda t : int(t[0])) group = pyparsing.Forward() group << ( pyparsing.Group(pyparsing.Literal("{").suppress() + pyparsing.ZeroOrMore(group) + pyparsing.Literal("}").suppress()) | integer | string ) toplevel = pyparsing.OneOrMore(group) sample = """ protocol sample_thread { { AUTOSTART 0 } { BITMAP thread.gif } { COORDS {0 0} } { DATAFORMAT { { TYPE hl7 } { PREPROCS { { ARGS {{}} } { PROCS sample_proc } } } } } } """ print toplevel.parseString(sample).asList() # Now define something a little more meaningful for a protocol structure, # and use Dict to auto-assign results names LBRACE,RBRACE = map(pyparsing.Suppress,"{}") protocol = ( pyparsing.Keyword("protocol") + string("name") + LBRACE + pyparsing.Dict(pyparsing.OneOrMore( pyparsing.Group(LBRACE + string + group + RBRACE) ) )("parameters") + RBRACE ) results = protocol.parseString(sample) print results.name print results.parameters.BITMAP print results.parameters.keys() print results.dump() 

Print

 ['protocol', 'sample_thread', [['AUTOSTART', 0], ['BITMAP', 'thread.gif'], ['COORDS', [0, 0]], ['DATAFORMAT', [['TYPE', 'hl7'], ['PREPROCS', [['ARGS', [[]]], ['PROCS', 'sample_proc']]]]]]] sample_thread thread.gif ['DATAFORMAT', 'COORDS', 'AUTOSTART', 'BITMAP'] ['protocol', 'sample_thread', [['AUTOSTART', 0], ['BITMAP', 'thread.gif'], ['COORDS', [0, 0]], ['DATAFORMAT', [['TYPE', 'hl7'], ['PREPROCS', [['ARGS', [[]]], ['PROCS', 'sample_proc']]]]]]] - name: sample_thread - parameters: [['AUTOSTART', 0], ['BITMAP', 'thread.gif'], ['COORDS', [0, 0]], ['DATAFORMAT', [['TYPE', 'hl7'], ['PREPROCS', [['ARGS', [[]]], ['PROCS', 'sample_proc']]]]]] - AUTOSTART: 0 - BITMAP: thread.gif - COORDS: [0, 0] - DATAFORMAT: [['TYPE', 'hl7'], ['PREPROCS', [['ARGS', [[]]], ['PROCS', 'sample_proc']]]] 

I think you will grow faster with pyparsing.

- Paul

+2
source

I will try to answer what, in my opinion, is the missing question (s) ...

Configuration files come in different formats. Well-known formats exist, such as * .ini or apache config - they usually have many parsers.

Then custom formats appear. This is what seems to you (it may be some clearly defined format that you and I have never seen before), but until you know what it is, it does not really matter.

I would start with the software from which it came from and see if they have a programming API that can load / produce these files. If nothing is visible, give Quovadx a call. Most likely, someone has already solved this problem.

Otherwise, you can probably create your own parser yourself.

Writing a parser for this format would be terribly difficult if your sample was a complete example. This is a hierarchy of values, where each node can contain either a value or a child hierarchy of values. Once you have identified the basic types that the parser can contain, this is a very simple structure.

You can write it fast enough using something like Lex / Flex or just a straightforward language parser in your chosen language.

+1
source

You can easily write a script in python that converts it to a python dict, the format looks almost like hierarchical pairs of name values, only the problem seems to Coards {0 0}, where {0 0} is not a pair of name values, but who knows, that other such cases are in the format I think your best bet is to have a specification for this format and write a simple python script to read it.

+1
source

Your configuration file is very similar to JSON (pretty much replace all your "{" and "}" with "[" as well as "]"). Most languages ​​have a built-in JSON parser (PHP, Ruby, Python, etc.), and if not, there are libraries available for processing it.

If you cannot change the format of the configuration file, you can read the entire contents of the file as a string and replace all the characters "{" and "}" with any method convenient for you. Then you can parse the string as JSON and you are tuned.

+1
source

I searched a bit at the Cheese Shop , but I did not find anything useful for your example. Check Examples and this particular parser (the syntax is a bit like yours). I think this should help you write your own.

0
source

Take a look at LEX and YACC . A little learning curve, but they can generate parsers for any language.

0
source

Perhaps you could write a simple script that will convert your config to an xml file and then only read it with lxml, Beatuful Soup or something else? And your converter can use PyParsing or regular expressions, for example.

-2
source

Source: https://habr.com/ru/post/1286229/


All Articles