8 seconds for regex_match (", regex (". {40000} ")) ;?

Update 2: Actually this regex(".{40000}");. This in itself already takes a lot of time. Why?


regex_match("", regex(".{40000}"));It takes almost 8 seconds on my pc. What for? Am I doing something wrong? I am using gcc 4.9.3 from MinGW on Windows 10 on the i7-6700.

Here's the full test program:

#include <iostream>
#include <regex>
#include <ctime>
using namespace std;

int main() {
    clock_t t = clock();
    regex_match("", regex(".{40000}"));
    cout << double(clock() - t) / CLOCKS_PER_SEC << endl;
}

How I compile and run it:

C:\Users\ ... \coding>g++ -std=c++11 test.cpp

C:\Users\ ... \coding>a.exe
7.643

Update: Time appears to be quadratic in this number. Doubling it is about four times the time:

  10000   0.520 seconds (factor 1.000)
  20000   1.922 seconds (factor 3.696)
  40000   7.810 seconds (factor 4.063)
  80000  31.457 seconds (factor 4.028)
 160000 128.904 seconds (factor 4.098)
 320000 536.358 seconds (factor 4.161)

The code:

#include <regex>
#include <ctime>
using namespace std;

int main() {
    double prev = 0;
    for (int i=10000; ; i*=2) {
        clock_t t0 = clock();
        regex_match("", regex(".{" + to_string(i) + "}"));
        double t = double(clock() - t0) / CLOCKS_PER_SEC;
        printf("%7d %7.3f seconds (factor %.3f)\n", i, t, prev ? t / prev : 1);
        prev = t;
    }
}

Still no idea why . This is a very simple regular expression and an empty string (although the same thing with short non-empty strings). He must immediately. Is the regex engine just weird and bad?

+4
2

...

( - ), . # , .

, , - O(n^2).

+2

:

clock_t t1 = clock();
regex r(".{40000}");
clock_t t2 = clock();
regex_match("", r);
clock_t t3 = clock();

cout << double(t2 - t1) / CLOCKS_PER_SEC << '\n'
     << double(t3 - t2) / CLOCKS_PER_SEC << endl;

:

0.077336
0.000613
+1

Source: https://habr.com/ru/post/1663397/


All Articles