I can not fix the error of the circulation ...

Question

I can not fix the error of the circulation ...

Overview

So, Im in the middle of refactoring the project, and Im separating a bunch of parsing code. The code that concerns Im is pyparsing.

I have a very poor understanding of piparization, even spending a lot of time reading the official documentation. I am having problems because (1) pyparsing takes a (intentionally) unorthodox approach to parsing and (2) Im working on code that I did not write, with bad comments and an elementary set of existing grammars.

(I, too, cannot contact the original author.)

Failed test

Im using PyVows to check my code. One of my tests is the following (I think this is clear, even if you are not familiar with PyVows, let me know if this is not the case):

def test_multiline_command_ends(self, topic): output = parsed_input('multiline command ends\n\n',topic) expect(output).to_equal( r'''['multiline', 'command ends', '\n', '\n'] - args: command ends - multiline_command: multiline - statement: ['multiline', 'command ends', '\n', '\n'] - args: command ends - multiline_command: multiline - terminator: ['\n', '\n'] - terminator: ['\n', '\n']''')

But when I run the test, I get the following in the terminal:

Failed Test Results

 Expected topic("['multiline', 'command ends']\n- args: command ends\n- command: multiline\n- statement: ['multiline', 'command ends']\n - args: command ends\n - command: multiline") to equal "['multiline', 'command ends', '\\n', '\\n']\n- args: command ends\n- multiline_command: multiline\n- statement: ['multiline', 'command ends', '\\n', '\\n']\n - args: command ends\n - multiline_command: multiline\n - terminator: ['\\n', '\\n']\n- terminator: ['\\n', '\\n']"

Note:

Since the output refers to the terminal, the expected output (second) has an additional backslash. This is normal. The test passed without problems before this piece of refactoring began.

Expected Behavior

The first line of output should match the second, but it is not. In particular, it does not include two newlines in this first list object.

So, I get this:

 "['multiline', 'command ends']\n- args: command ends\n- command: multiline\n- statement: ['multiline', 'command ends']\n - args: command ends\n - command: multiline"

When should I get the following:

 "['multiline', 'command ends', '\\n', '\\n']\n- args: command ends\n- multiline_command: multiline\n- statement: ['multiline', 'command ends', '\\n', '\\n']\n - args: command ends\n - multiline_command: multiline\n - terminator: ['\\n', '\\n']\n- terminator: ['\\n', '\\n']"

Earlier in the code there is also this statement:

 pyparsing.ParserElement.setDefaultWhitespaceChars(' \t')

... Which, I think, should prevent just such an error. But I'm not sure.

Even if the problem cannot be identified with certainty, simply narrowing down where the problem is will be HUGE help.

Please let me know how I can take a step or two to fix this.

Edit: So, I have to publish the parser code for this, right? (Thanks for the tip, @andrew cooke!)

Parser Code

She has __init__ for my parser object.

I know his nightmare. That is why Im refactoring a project. ☺

 def __init__(self, Cmd_object=None, *args, **kwargs): # @NOTE # This is one of the biggest pain points of the existing code. # To aid in readability, I CAPITALIZED all variables that are # not set on `self`. # # That means that CAPITALIZED variables aren't # used outside of this method. # # Doing this has allowed me to more easily read what # variables become a part of other variables during the # building-up of the various parsers. # # I realize the capitalized variables is unorthodox # and potentially anti-convention. But after reaching out # to the project creator several times over roughly 5 # months, I'm still working on this project alone... # And without help, this is the only way I can move forward. # # I have a very poor understanding of the parser's # control flow when the user types a command and hits ENTER, # and until the author (or another pyparsing expert) # explains what happening to me, I have to do silly # things like this. :-| # # Of course, if the impossible happens and this code # gets cleaned up, then the variables will be restored to # proper capitalization. # # —Zearin # http://github.com/zearin/ # 2012 Mar 26 if Cmd_object is not None: self.Cmd_object = Cmd_object else: raise Exception('Cmd_object be provided to Parser.__init__().') # @FIXME # Refactor methods into this class later preparse = self.Cmd_object.preparse postparse = self.Cmd_object.postparse self._allow_blank_lines = False self.abbrev = True # Recognize abbreviated commands self.case_insensitive = True # Commands recognized regardless of case # make sure your terminators are not in legal_chars! self.legal_chars = u'!#$%.: ?@ _' + PYP.alphanums + PYP.alphas8bit self.multiln_commands = [] if 'multiline_commands' not in kwargs else kwargs['multiln_commands'] self.no_special_parse = {'ed','edit','exit','set'} self.redirector = '>' # for sending output to file self.reserved_words = [] self.shortcuts = { '?' : 'help' , '!' : 'shell', '@' : 'load' , '@@': '_relative_load' } # self._init_grammars() # # def _init_grammars(self): # @FIXME # Add Docstring # ---------------------------- # Tell PYP how to parse # file input from '< filename' # ---------------------------- FILENAME = PYP.Word(self.legal_chars + '/\\') INPUT_MARK = PYP.Literal('<') INPUT_MARK.setParseAction(lambda x: '') INPUT_FROM = FILENAME('INPUT_FROM') INPUT_FROM.setParseAction( self.Cmd_object.replace_with_file_contents ) # ---------------------------- #OUTPUT_PARSER = (PYP.Literal('>>') | (PYP.WordStart() + '>') | PYP.Regex('[^=]>'))('output') OUTPUT_PARSER = (PYP.Literal( 2 * self.redirector) | \ (PYP.WordStart() + self.redirector) | \ PYP.Regex('[^=]' + self.redirector))('output') PIPE = PYP.Keyword('|', identChars='|') STRING_END = PYP.stringEnd ^ '\nEOF' TERMINATORS = [';'] TERMINATOR_PARSER = PYP.Or([ (hasattr(t, 'parseString') and t) or PYP.Literal(t) for t in TERMINATORS ])('terminator') self.comment_grammars = PYP.Or([ PYP.pythonStyleComment, PYP.cStyleComment ]) self.comment_grammars.ignore(PYP.quotedString) self.comment_grammars.setParseAction(lambda x: '') self.comment_grammars.addParseAction(lambda x: '') self.comment_in_progress = '/*' + PYP.SkipTo(PYP.stringEnd ^ '*/') # QuickRef: Pyparsing Operators # ---------------------------- # ~ creates NotAny using the expression after the operator # # + creates And using the expressions before and after the operator # # | creates MatchFirst (first left-to-right match) using the # expressions before and after the operator # # ^ creates Or (longest match) using the expressions before and # after the operator # # & creates Each using the expressions before and after the operator # # * creates And by multiplying the expression by the integer operand; # if expression is multiplied by a 2-tuple, creates an And of # (min,max) expressions (similar to "{min,max}" form in # regular expressions); if min is None, intepret as (0,max); # if max is None, interpret as expr*min + ZeroOrMore(expr) # # - like + but with no backup and retry of alternatives # # * repetition of expression # # == matching expression to string; returns True if the string # matches the given expression # # << inserts the expression following the operator as the body of the # Forward expression before the operator # ---------------------------- DO_NOT_PARSE = self.comment_grammars | \ self.comment_in_progress | \ PYP.quotedString # moved here from class-level variable self.URLRE = re.compile('(https?://[-\\w\\./]+)') self.keywords = self.reserved_words + [fname[3:] for fname in dir( self.Cmd_object ) if fname.startswith('do_')] # not to be confused with `multiln_parser` (below) self.multiln_command = PYP.Or([ PYP.Keyword(c, caseless=self.case_insensitive) for c in self.multiln_commands ])('multiline_command') ONELN_COMMAND = ( ~self.multiln_command + PYP.Word(self.legal_chars) )('command') #self.multiln_command.setDebug(True) # Configure according to `allow_blank_lines` setting if self._allow_blank_lines: self.blankln_termination_parser = PYP.NoMatch else: BLANKLN_TERMINATOR = (2 * PYP.lineEnd)('terminator') #BLANKLN_TERMINATOR('terminator') self.blankln_termination_parser = ( (self.multiln_command ^ ONELN_COMMAND) + PYP.SkipTo( BLANKLN_TERMINATOR, ignore=DO_NOT_PARSE ).setParseAction(lambda x: x[0].strip())('args') + BLANKLN_TERMINATOR )('statement') # CASE SENSITIVITY for # ONELN_COMMAND and self.multiln_command if self.case_insensitive: # Set parsers to account for case insensitivity (if appropriate) self.multiln_command.setParseAction(lambda x: x[0].lower()) ONELN_COMMAND.setParseAction(lambda x: x[0].lower()) self.save_parser = ( PYP.Optional(PYP.Word(PYP.nums)^'*')('idx') + PYP.Optional(PYP.Word(self.legal_chars + '/\\'))('fname') + PYP.stringEnd) AFTER_ELEMENTS = PYP.Optional(PIPE + PYP.SkipTo( OUTPUT_PARSER ^ STRING_END, ignore=DO_NOT_PARSE )('pipeTo') ) + \ PYP.Optional(OUTPUT_PARSER + PYP.SkipTo( STRING_END, ignore=DO_NOT_PARSE ).setParseAction(lambda x: x[0].strip())('outputTo') ) self.multiln_parser = (((self.multiln_command ^ ONELN_COMMAND) + PYP.SkipTo( TERMINATOR_PARSER, ignore=DO_NOT_PARSE ).setParseAction(lambda x: x[0].strip())('args') + TERMINATOR_PARSER)('statement') + PYP.SkipTo( OUTPUT_PARSER ^ PIPE ^ STRING_END, ignore=DO_NOT_PARSE ).setParseAction(lambda x: x[0].strip())('suffix') + AFTER_ELEMENTS ) #self.multiln_parser.setDebug(True) self.multiln_parser.ignore(self.comment_in_progress) self.singleln_parser = ( ( ONELN_COMMAND + PYP.SkipTo( TERMINATOR_PARSER ^ STRING_END ^ PIPE ^ OUTPUT_PARSER, ignore=DO_NOT_PARSE ).setParseAction(lambda x:x[0].strip())('args'))('statement') + PYP.Optional(TERMINATOR_PARSER) + AFTER_ELEMENTS) #self.multiln_parser = self.multiln_parser('multiln_parser') #self.singleln_parser = self.singleln_parser('singleln_parser') self.prefix_parser = PYP.Empty() self.parser = self.prefix_parser + (STRING_END | self.multiln_parser | self.singleln_parser | self.blankln_termination_parser | self.multiln_command + PYP.SkipTo( STRING_END, ignore=DO_NOT_PARSE) ) self.parser.ignore(self.comment_grammars) # a not-entirely-satisfactory way of distinguishing # '<' as in "import from" from # '<' as in "lesser than" self.input_parser = INPUT_MARK + \ PYP.Optional(INPUT_FROM) + \ PYP.Optional('>') + \ PYP.Optional(FILENAME) + \ (PYP.stringEnd | '|') self.input_parser.ignore(self.comment_in_progress)

+6

python parsing testing pyparsing

Zearin Apr 10 '12 at 19:40

source share

2 answers

I suspect the problem is scrolling in the built-in missing space, which by default skips newlines. Even if setDefaultWhitespaceChars used to indicate that new characters are significant, this option only affects all expressions created after calling setDefaultWhitespaceChars . The problem is that pyparsing is trying to help by defining a number of convenient expressions when importing, for example empty for Empty() , lineEnd for LineEnd() and so on. But since they were all created during import, they are defined with the default source space characters, which include '\n' .

I should probably just do this in setDefaultWhitespaceChars , but you can also clear this for yourself. Immediately after calling setDefaultWhitespaceChars redefine these expressions at the module level in pyparsing:

 PYP.ParserElement.setDefaultWhitespaceChars(' \t') # redefine module-level constants to use new default whitespace chars PYP.empty = PYP.Empty() PYP.lineEnd = PYP.LineEnd() PYP.stringEnd = PYP.StringEnd()

I think this will help restore the significance of inline newlines.

Some other bits of your parser code:

  self.blankln_termination_parser = PYP.NoMatch

it should be

  self.blankln_termination_parser = PYP.NoMatch()

Perhaps your original author is too aggressive using '^' over '|'. Use only "^" if there is a chance of parsing one expression by chance, if you really analyzed the longer one, which follows later in the list of alternatives. For example, in:

  self.save_parser = ( PYP.Optional(PYP.Word(PYP.nums)^'*')('idx')

There is no possible confusion between Word numeric digits or a single '*' . The operator Or (or '^' ) tells pyparsing to try to evaluate all the alternatives, and then select the longest corresponding one - in case of equality, select the left-most alternative in the list. If you parse '*' , there is no need to see if it can match a longer integer, or if you parse an integer, you do not need to see if it can also pass as a single '*' . So change this to:

  self.save_parser = ( PYP.Optional(PYP.Word(PYP.nums)|'*')('idx')

Using a parsing action to replace a string with '' is more easily written using the PYP.Suppress shell or, if you like, calls expr.suppress() , which returns Suppress(expr) . Combined with preference '|' above '^', this is:

  self.comment_grammars = PYP.Or([ PYP.pythonStyleComment, PYP.cStyleComment ]) self.comment_grammars.ignore(PYP.quotedString) self.comment_grammars.setParseAction(lambda x: '')

becomse:

  self.comment_grammars = (PYP.pythonStyleComment | PYP.cStyleComment ).ignore(PYP.quotedString).suppress()

Keywords have built-in logic to automatically avoid ambiguity, so Or is completely unnecessary with them:

  self.multiln_command = PYP.Or([ PYP.Keyword(c, caseless=self.case_insensitive) for c in self.multiln_commands ])('multiline_command')

it should be:

  self.multiln_command = PYP.MatchFirst([ PYP.Keyword(c, caseless=self.case_insensitive) for c in self.multiln_commands ])('multiline_command')

(In the next release, I will loosen these initializers to accept generator expressions, so that [] will become unnecessary.)

That is all I can see now. Hope this helps.

+5

Paulmcg Apr 11 '12 at 3:53

source share

Zearin · Accepted Answer · 2012-04-11T18:39:50+0000

I fixed it!

Pyraring was not to blame!

I was. ☹

Dividing the parsing code into another object, I created a problem. The attribute was originally used to “update itself” based on the contents of the second attribute. Since all this was always contained in one "class of gods", it worked perfectly.

By simply separating the code from another object, the first attribute was set in the instance, but was no longer "updated itself" if the second attribute depended on changes.

Features

The multiln_command attribute (not to be confused with multiln_commands -aargh, what a confusing naming convention!) Was the pyparsing grammar definition. The multiln_command attribute should update its grammar if multiln_commands has ever changed.

Although I knew that these two attributes have similar names, but they have very different goals, the similarity definitely made tracking the problem difficult. I have not renamed multiln_command to multiln_grammar .

However! ☺

I am grateful to @Paul McGuires for the excellent answer, and I hope this will save me (and others) some grief in the future. Although I feel a little stupid that I caused the problem (and misconceived it as a piracy problem), Im happy some good (in the form of Pauls advice) came up with a question about it.

Happy parsing, that's all. :)

I can not fix the error of the circulation ...

Overview

Failed test

Failed Test Results

Expected Behavior

Parser Code

I fixed it!

Features

However! ☺

More articles: