The Best Way to Save Many Pairs of Positions in Emacs Lisp

I am in the middle of converting an ordinary tokenizer of a complete restart (ported from the original language parser, the language is irrelevant here) to a more advanced incremental one. This implies the following:

a) it must be fast, very fast;

b) each time the text is updated (whether it is insertion or deletion), it must find the damaged tokens and repair the token list accordingly.

The original version of the tokenizer only builds a list of tokens, passing through the buffer text using regular expressions; each token in the list is a vector of 4 elements ("TOKENTYPE" token-lexeme "linum charnum]). Linum / charnum is a prime number that indicates the position of the marker in the buffer at the time of the lexing. Easy pie.

Now, to the point. Whenever (well ... not every time, but often enough) a user adds or deletes a character (or string), a new tokenizer must find a marker constructed using text in the edit position and possibly dependent tokens for later deletions / updates.

There are two problems here:

a) marker positions should be dynamic (for example, if the user adds some text at the beginning of the buffer → we should not worry about fixing tokens at the end of the buffer);

b) a way to get a damaged token (or many tokens in general).

Now I'm trying to use overlays for the task, since overlays have a nice interface that suits my needs: the overlays-at / overlays-in functions help in the search; and overlay / end overlay accordingly.

I can happily do this for a smaller file. But it turns out (and I have to admit that I was warned by the documents) that the solution does not scale: even the average LOC LK file can have CONST * LOC overlays, which is too much for Emacs.

This was my first attempt, and it was not successful. I am considering alternatives such as:

1) management of a handwritten token search tree using primes;

2) the same tree, but using markers;

3) some mixed approach, including both prime numbers and markers.

Any alternatives to the mentioned approaches? Or maybe there is a better way to handle a lot of overlays?

+5
source share
2 answers

Like Lindydancer, I suggest you use textual properties rather than overlays. An overlay scale like O (N ^ 2), for example, while text properties scale like O (N log N), so it works much better. I would not use font-lock for any of them, because K.

Of course, an alternative solution is to fix the overlays: the C code can be changed to make it O (N log N). I know how to do this for a while, but I did not find the time (and I hardly find time in any foreseeable future), therefore, if someone is interested, I would be very happy to help him do it.

+5
source

An alternative to overlays is text properties, they are attached to the text so that overlays are not, so they are much more efficient.

A package that makes heavy use of text properties is blocking fonts. It is usually used to allocate buffers, but there is nothing to prevent you from copying it for your own purposes. This way you get the whole system to detect that the user modified the contents of the buffer for free.

In your case, you can replace the regular expression in the font-lock keyword with a function that will be called with a search limit. All you have to do is scan the relative short section, set your own text properties, and you're done. (In addition, you must tell font-lock which property you are setting using font-lock-extra-managed-props .)

+3
source

Source: https://habr.com/ru/post/1202888/


All Articles