A few error reports with menhir: which token?

Question

A few error reports with menhir: which token?

I am writing a small parser with Menhir + Ocamllex, and I have two requirements that I cannot meet at the same time.

I would like to continue parsing after the error (to report more errors).
I would like to print the token that caused the error.

I can only do 1) easily using the error token. I can also do only 2) easily using the approach suggested for this question . However, I do not know how easy it is to achieve both.

The error handling method right now looks something like this:

 pair: | left = prodA SEPARATOR right = prodA { (* happy case *) } | error SEPARATOR right = prodA { print_error_report $startpos; (* would like to continue after the first error, just in case there is a second error, so I report both *) }

One thing that will help me is to access lexbuf itself so that I can get the token directly. That would mean instead of $startpos passing something like $lexbuf . But as far as I can tell, there is no official way to access lexbuf. Solution 1 works only at the caller level in the parser, where the caller itself passes the parser lexbuf t othe, but not in semantic actions.

Does anyone know if it is really available? or perhaps a workaround?

+5

ocaml ocamlyacc ocamllex menhir

orm Dec 08 '14 at 3:19

source share

1 answer

gasche · Answer 1 · 2015-01-18T10:37:28+0000

Thanks to the collaboration of Frédéric Bour and François Pottier, a new version of Menhir has appeared that supports incremental parsing. See the email ad sent December 17th.

The idea of this incremental API is to reverse control: instead of the parser that calls the lexer to process the input, you have a lower-level API in which you control the state of the parser that returns the updated state after each token consumed (this is a bit finer, since you can observe internal contractions that do not require new tokens). In particular, you can observe whether the received analyzer state is an error, and select return to the original file and enter another input (depending on your initial start of errors) to go further at your input.

The general idea is that this will allow you to implement good error recovery and error reporting strategies on the side of the parser and slowly discount the rather inflexible mechanism of error markers.

This can already be used, but work on these features is still ongoing, and you should expect more reliable support for these new features in other releases over the coming months.

A few error reports with menhir: which token?

More articles: