Simple CSV lexer

I want to color CSV files using pygments column by column, like here:

enter image description here

See that the same column is painted in the same color.

At present, pigments do not include a CSV analyzer because CSV said it was an obscure format . So I tried to write minimal myself. Here is what I tried:

tokens = { 'root': [ (r'^[^,\n]+', Name.Function), # first column (',', Comment), # separator (r'[^,\n]+', Name.Decorator), # second column (',', Comment), # separator (r'[^,\n]+', Name.Constant), # third column (',', Comment), # separator ], } 

But it cannot color any column, but first:

enter image description here

As far as I know, pygments works by trying to match regular expressions one by one: when the current regexp does not match, it goes to the next and then again and again. If nothing matches, it throws an error and advances one character (and puts it in a red frame). For advanced cases, such as nested comments, there are states, but I think that one state is enough for CSV.

Then I tried:

 tokens = { 'root': [ (',', Comment), # separator (r'^[^,\n]+', Name.Function), # first column (r'(?:^[^,\n]+)[^,\n]+', Name.Decorator), # second column ], } 

But it colors the entire column as second:

enter image description here

Here is an example of data:

 account_id,parent_account_id,name,status ,A001,English,active A001,,Humanities,active A003,A001,,active A004,A002,Spanish, 

In Emacs, I managed to get what I wanted:

 (add-hook 'csv-mode-hook (lambda () "colors first 8 csv columns differently" (font-lock-add-keywords nil '(("^\\([^,\n]*\\)," 1 'font-lock-function-name-face))) (font-lock-add-keywords nil '(("^\\([^,\n]*\\),\\([^,\n]*\\)" 2 'font-lock-variable-name-face))) (font-lock-add-keywords nil '(("^\\([^,\n]*\\),\\([^,\n]*\\),\\([^,\n]*\\)" 3 'font-lock-keyword-face))) (font-lock-add-keywords nil '(("^\\([^,\n]*\\),\\([^,\n]*\\),\\([^,\n]*\\),\\([^,\n]*\\)" 4 'font-lock-type-face))) )) 

(I actually added more than 4 columns, but that doesn't matter)

What gives:

enter image description here

+6
source share
1 answer

Oh, I solved this with states:

 tokens = { 'root': [ (r'^[^,\n]*', Name.Function, 'second'), ], 'second': [ (r'(,)([^,\n]*)', bygroups(Comment, Name.Decorator), 'third'), ], 'third': [ (r'(,)([^,\n]*)', bygroups(Comment, Name.Constant), 'fourth'), ], 'fourth': [ (r'(,)([^,\n]*)', bygroups(Comment, Name.Variable), 'fifth'), ], 'fifth': [ (r'(,)([^,\n]*)', bygroups(Comment, Keyword.Type), 'unsupported'), ], 'unsupported': [ (r'.+', Comment), ], } 

It first colors the 5 columns of CSV differently, and all the rest as comments:

enter image description here

+5
source

Source: https://habr.com/ru/post/974354/


All Articles