I use PyParsing to parse fairly large text files with a C-like format ( brackets and semicolons and all that).
PyParsing works just fine, but it is slow and consumes a very large amount of memory due to the size of my files.
In this regard, I wanted to try to implement an incremental parsing method, in which I would parse the top-level elements of the source file one by one. The pyparsing scanString method seems like an obvious way to do this. However, I want to make sure that there is no invalid / unchecked text between the sections processed by scanString , and cannot find a good way to do this.
Here is a simplified example that shows the problem I am facing:
sample="""f1(1,2,3); f2_no_args( ); # comment out: foo(4,5,6); bar(7,8); this should be an error; baz(9,10); """ from pyparsing import * COMMENT=Suppress('#' + restOfLine()) SEMI,COMMA,LPAREN,RPAREN = map(Suppress,';,()') ident = Word(alphas, alphanums+"_") integer = Word(nums+"+-",nums) statement = ident("fn") + LPAREN + Group(Optional(delimitedList(integer)))("arguments") + RPAREN + SEMI p = statement.ignore(COMMENT) for res, start, end in p.scanString(sample): print "***** (%d,%d)" % (start, end) print res.dump()
Output:
***** (0,10) ['f1', ['1', '2', '3']] - arguments: ['1', '2', '3'] - fn: f1 ***** (11,25) ['f2_no_args', []] - arguments: [] - fn: f2_no_args ***** (53,62) ['bar', ['7', '8']] - arguments: ['7', '8'] - fn: bar ***** (88,98) ['baz', ['9', '10']] - arguments: ['9', '10'] - fn: baz
The ranges returned by scanString have gaps due to the undisclosed text between them ((0.10), (11.25), (53.62), (88.98)). Two of these spaces are spaces or comments that should not cause an error, but one of them ( this should be an error; ) contains text that I do not want to read.
Is there a way to use pyparsing to parse a file, while maintaining that all input can be parsed using the specified parser grammar?