Parsing in text editors

Question

Parsing in text editors

How do text editors syntax highlight? I know that vim uses simple regular expressions with special extensions to make them more powerful for differentiating syntax elements, but I also know that some other text editors like TextMate allow you to define full parsers. TextMate is known to work well on large files, but Sublime Text supposedly works better than vim on large files, and still supports legacy TextMate parsers. Are there any interesting hacks that he uses to avoid parsing the file, or to use only a very efficient parsing algorithm?

+4

vim editor parsing textmate

adrusi Oct 29 '12 at 7:04

source share

1 answer

d11wtq · Accepted Answer · 2012-10-29T11:17:25+0000

I once wrote a text editor. I thought I could do better than others. Then I recognized Vim and realized that I was wrong: P Parts of my highlight engine still exist on GitHub .

Several approaches are possible. You could write real lexical analytic (or small syntactic) routines, but regular expressions can be faster if you use them efficiently and you are not an expert in source parsing theory. I used a combination of the two.

To get good performance, editors are unlikely to select the entire file. Instead, just select the visible area of the file so that you minimize the work done. Of course, then you need to think about what happens when the user starts editing somewhere in the middle of this visible area. My approach was to store a snapshot of the state of the lexer (i.e., placing all tokens and lexical states) in memory all the time, then from the cursor, go back one or two tokens, use the state of the lexer at this point (i.e. Save markers and state stacks on the left and discard them on the right) and restart the marker from this point to the end of the visible range. Since all (I think) source languages are read from left to right, the allocation of tokens further to the left of the edited region should never change.

EDIT | Just rereading my source, there were some other optimizations that I made along the way. Long lists of keywords (e.g. built-in function names) are expensive to check. I built them in a radix tree, which had a huge increase in performance.

Parsing in text editors

More articles: