It seems that the generosity has been canceled, because the decision is so simple that I simply did not consider it. Let me explain. When scanning a simple nested comment
(* (*..*) *)
I need to keep track of how many comment opening tokens I see, so finally, in the last valid closing comment, you can return the entire comment as a single token.
I didn’t understand that JFlex doesn’t need to talk to go to the next part when it matches something. After a thorough review, I saw that this is explained here , but somewhat hidden in a section that I did not care about:
Since we have not returned the value to the parser, our scanner will act immediately.
Hence a rule in the flex file like this
[^\(\*\)]+ { }
reads all characters except those that may be the beginning / end of a comment and does nothing , but it advances to the next token .
This means that I can just do the following. In the YYINITIAL state, I have a rule that matches the initial comment, but it does nothing, and then switches the lexer to the IN_COMMENT state. In particular, it does not return any token:
{CommentStart} { yypushstate(IN_COMMENT);}
Now we are in the IN_COMMENT state and there, I do the same. I eat all the characters, but never return the token. When I find a new comment on the discovery, I carefully push it onto the stack, but do nothing. Only when I click on the last comment, I know that I leave the IN_COMMENT state, and this is the only point where I finally return the token. Let's look at the rules:
<IN_COMMENT> { {CommentStart} { yypushstate(IN_COMMENT);} [^\(\*\)]+ { } {CommentEnd} { yypopstate(); if(yystate() != IN_COMMENT) return MathematicaElementTypes.COMMENT_CONTENT; } [\*\)\(] { } . { return MathematicaElementTypes.BAD_CHARACTER; } }
What is it. Now, no matter how deep your comment is, you always get one single token containing the entire comment.
Now I'm confused, and I'm sorry for such a simple question.
Final note
If you do something like this, you should remember that you only return the token when you click the correct closing symbol. Therefore, you should definitely make a rule that catches the end of the file. In IDEA, this default behavior is to simply return the comment token, so you need a different line (useful or not, I want to end the grace):
<<EOF>> { yyclearstack(); yybegin(YYINITIAL); return MathematicaElementTypes.COMMENT;}