Possible problems with returning regular expressions?

I have the following regex and input:

http://regex101.com/r/cI3fG4

Basically, I want to live up to the very last “summer” and keep everything in green (group (1)).

This is great for small files / input.

However, if I run this from inside java against a very large (100k) file in which there are no pattern matches (just a bunch of text - war & world snippet), it may take 10 + sec to return from finding the match. I suspect regex return issues (specifically matching (. *) Group (1)).

What can I do to prevent backtracking in a use case and speed up this regex to meet the above requirements?

- Java code -

    // Works fine for this small snippet but when run against 100k large input
    // as described above some serious perf issues start happening.  

    String text = "Hi\n\nyo keep this here\n\nKeep this here\n\nyo\nkey match line here cut me:\n\nAll of this here should be deleted";
    System.out.println(text);
    Pattern PATTERN = Pattern.compile("^(.*)((\\byo\\b.*?(cut me:).*))$",
            Pattern.MULTILINE | Pattern.DOTALL);
    Matcher m = PATTERN.matcher(text);
    if (m.find()) {
        text = m.group(1);
        System.out.println(text);
    }
+4
1

:

^([\s\S]*)\byo\b[\s\S]*?(cut me:)

m s.

-: http://regex101.com/r/lC9yZ5

. ( regex101)

+2

Source: https://habr.com/ru/post/1532599/


All Articles