Iterator Exchange

Question

Iterator Exchange

I am writing a (simple) compiler in Scala and making the tokenizer iterable and now I need to write a parser. The plan is to use a recursive descent strategy, and therefore the parser will be divided into a number of methods, each of which calls (some of) others.

I suppose it will be necessary / preferable to maintain the state of the tokenizer iterator and share it between different methods. This is true? How should I do it? If not, what are the alternatives?

+4

iterator scala parsing share

Simon morgan Feb 05 '12 at 14:33

source share

2 answers

You could simply create a token iterator and pass it to each recursive parser so that parsing on tokens is read from it:

 def parse2(tokens: Iterator[String]) = List(tokens.next, tokens.next) def parse1(tokens: Iterator[String]) = List(parse2(tokens), parse2(tokens)) val tokens = List("a","b","c","d").iterator val parsed = parse1(tokens) //List(List(a, b), List(c, d))

+1

dhg Feb 05 '12 at 17:45

source share

Rex kerr · Accepted Answer · 2012-02-05T18:07:48+0000

If you need to maintain an iterator state, do not use an iterator! Iterators are designed when you can destroy your state when you go.

You may be able to use the stream. Streams have the habit of not abandoning their memory when they should because of links that persist where you do not want them (but where you can say that they exist if you think about it). Therefore, if you started with an iterator, you could .Stream it and pass the sub-streams, and then pass the stream for further processing. But you have to be very careful not to refer to the head of the stream if you want not to store everything in memory.

Another way is to simply dump everything into a vector or array and save the whole problem in memory; after that you can discard unnecessary parts (or advance the index).

Finally, if you are absolutely sure that you do not need backtracking, you can just use an iterator without worrying about “maintaining state”. That is, when you return from the sub-method, you will already consume exactly the exact tokens and no more, and will have the right to freely process parses. For this to work without at least a one-element “next token that I didn’t consume” by the return value, you should be able to predict where the last token is (for example, a list of unlimited length would end with a token that was part of the list, so {1,2,3} can be a list (if you go into list processing when you see { and drop out when you press } ), but not 1,2,3 + 7 (because you’d consume + before you understand that the list is over)).

Iterator Exchange

More articles: