I am in the middle of converting an ordinary tokenizer of a complete restart (ported from the original language parser, the language is irrelevant here) to a more advanced incremental one. This implies the following:
a) it must be fast, very fast;
b) each time the text is updated (whether it is insertion or deletion), it must find the damaged tokens and repair the token list accordingly.
The original version of the tokenizer only builds a list of tokens, passing through the buffer text using regular expressions; each token in the list is a vector of 4 elements ("TOKENTYPE" token-lexeme "linum charnum]). Linum / charnum is a prime number that indicates the position of the marker in the buffer at the time of the lexing. Easy pie.
Now, to the point. Whenever (well ... not every time, but often enough) a user adds or deletes a character (or string), a new tokenizer must find a marker constructed using text in the edit position and possibly dependent tokens for later deletions / updates.
There are two problems here:
a) marker positions should be dynamic (for example, if the user adds some text at the beginning of the buffer â we should not worry about fixing tokens at the end of the buffer);
b) a way to get a damaged token (or many tokens in general).
Now I'm trying to use overlays for the task, since overlays have a nice interface that suits my needs: the overlays-at / overlays-in functions help in the search; and overlay / end overlay accordingly.
I can happily do this for a smaller file. But it turns out (and I have to admit that I was warned by the documents) that the solution does not scale: even the average LOC LK file can have CONST * LOC overlays, which is too much for Emacs.
This was my first attempt, and it was not successful. I am considering alternatives such as:
1) management of a handwritten token search tree using primes;
2) the same tree, but using markers;
3) some mixed approach, including both prime numbers and markers.
Any alternatives to the mentioned approaches? Or maybe there is a better way to handle a lot of overlays?