Any way to speed up instaparse?

I am trying to use instaparse in a dimacs file less than 700k in size with the following grammar

<file>=<comment*> <problem?> clause+ comment=#'c.*' problem=#'p\s+cnf\s+\d+\s+\d+\s*' clause=literal* <'0'> <literal>=#'[1-9]\d*'|#'-\d+' 

causes so

 (def parser (insta/parser (clojure.java.io/resource "dimacs.bnf") :auto-whitespace :standard)) ... (time (parser (slurp filename))) 

and it takes about a hundred seconds. Three orders of magnitude slower than I hoped. Is there a way to speed it up, somehow adjust the grammar, or some option that I am missing?

+5
source share
2 answers

Grammar is incorrect. He cannot be satisfied.

  • Each file ends with clause .
  • Each clause ends with a '0' .
  • literal in clause , being greedy reg-exp , will eat the final '0' .

Conclusion: No clause .

For instance...

 => (parser "60") Parse error at line 1, column 3: 60 ^ Expected one of: "0" #"\s+" #"-\d+" #"[1-9]\d*" 

We can analyze a literal

 => (parser "60" :start :literal) ("60") 

... but not clause

 => (parser "60" :start :clause) Parse error at line 1, column 3: 60 ^ Expected one of: "0" (followed by end-of-string) #"\s+" #"-\d+" #"[1-9]\d*" 

Why is it so slow?

If comment exists:

  • he can assimilate the whole file;
  • or break into any character 'c' into consecutive comment s;
  • or termination at any point after the initial 'c' .

This means that each tail should be represented by the rest of the grammar, which includes reg-exp for literal , which Instaparse does not see inside. Therefore, everything must be judged, and all this will ultimately fail. No wonder it's slow.


I suspect this file is actually line split. And your problems arise from trying to combine new lines with other forms of white space.

May I fondly point out that playing with a few tiny examples - and that’s all I did - could save you a ton of trouble.

+2
source

I think your widespread use * is causing a problem. Your grammar is too ambiguous / ambitious (I think). I would check two things:

 ;;run it as (insta/parses grammar input) ;; with a small input 

This will show you how vague the definition of grammar is: check for "ambiguous grammar."

Read Engelberg 's performance notes to help you figure out your own problem and possibly figure out what works best for you.

0
source

Source: https://habr.com/ru/post/1246945/


All Articles