I need to parse files with millions of lines. I noticed that my partner parser is getting slower and slower as it parses more and more lines. It seems that the problem is with the scala "rep" or regex parsers, as this behavior occurs even for the simple sample analyzer shown below:
def file: Parser[Int] = rep(line) ^^ { 1 } // a file is a repetition of lines
def line: Parser[Int] = """(?m)^.*$""".r ^^ { 0 } // reads a line and returns 0
When I try to parse a file with millions of lines of equal length with this simple parser, it first parses 46 lines / ms. After 370000 lines, the speed drops to 20 lines / ms. After 840,000 lines, it drops to 10 lines / ms. After 1790000 lines, 5 lines / ms ...
My questions:
Bruno source
share