How a custom Stream type affects location information. in Parsek?

I am using Parsec with a custom Stream type. This stream is essentially a String , but sometimes it extends the input that it finds in a string to other strings (I think the alias extension). For example, given "Β§4.1 ΒΆ3", it can pass "Section 4.1, paragraph 3" to the parser.

Everything works. My types look like this:

 data DealiasingStream = ... instance (Monad m) => Stream DealiasingStream m Char where ... type ShellParser = Parsec DealiasingStream () 

Note that the dependent type of DealiasingStream is just Char . This allows my parsers (well, my ShellParser use all standard character parsers.

My question is to get Parsec to report positions in terms of the original input to my stream. The documentation for Stream states:

A Stream instance is responsible for maintaining the "position in the stream" in the state of stream s . This is trivial unless you use the monad in a non-trivial way.

In fact, my stream type knows what position it wants to report at any moment ... but I don’t understand how to get Parsec to use it! Parsec seems to support its own SourcePos as part of its internal State . And that seems to be being updated by various token primitives and therefore standard Char parsers, out of my control.

How can I do that?

+4
source share
2 answers

I agree with your understanding - there is no easy way to control your position without overwriting functions like char .

What documentation means, a Stream instance is responsible for recording position information inside tokens. This information can then be used in functions such as token or tokenPrim (by providing them with appropriate position calculation functions).

Thus, you must wrap char in a data type that includes location information and rewrite basic functions using primitives such as token or tokenPrim , which are flexible in calculating the location.

+1
source

You can create a new SourcePos with functions in Text.Parsec.Pos and set it to the parser with setPosition in Text.Parsec.Prim .

Edit:

I'm not sure why you need a custom thread, since you are not changing the type of token. You should be able to use the standard Char analyzer, as well as expand and update positions in your whitespace rule. I did this with cpp to expand the macros and use the custom whitespace rule to find the #line commands that I use to update the position with setPosition . You can use it to search for extensions and change the input stream by adding the extension to getInput and returning the result to the parser using setInput . The documentation for setInput suggests using it to extend the #include directives, which is essentially the same problem.

-1
source

Source: https://habr.com/ru/post/1447530/


All Articles