Java Regular Expression Frequency

In some examples, the acceleration time for a large number of lines:

.split("[^a-zA-Z]"); // .44 seconds
.split("[^a-zA-Z]+"); // .47 seconds
.split("\\b+"); // 2 seconds

Any explanation for the sharp increase? I can imagine that the template [^ a-zA-Z] is executed in the processor as a set of four comparison operations, of which all four occur only if this is the true case. How about \ b? Does anyone have something to weigh for this?

+3
source share
2 answers

Firstly, it makes no sense to separate one or more statements of zero width! Javas regex is not very smart - and I'm a charity - about smart optimizations.

Secondly, never use \bin Java: it is messed up and not in sync with \w.

, , Unicode, . .

+4

\b - , [^A-Za-z]. \b if/then (. tchrist ), , , , . , , .

, , , [^a-zA-Z]+. , . , :

import java.lang.String;

class RegexDemo {
    private static void testSplit(String msg, String re) {
        String[] pieces = "the quick brown fox".split(re);
        System.out.println(msg);
        for (String s : pieces) {
            System.out.println(s);
        }
        System.out.println("----");
    }

    public static void main(String args[]) {
        testSplit("boundary:", "\\b+");
        testSplit("not alpha:", "[^A-Za-z]+");
    }
}

, , String.split(), . , ,

Pattern boundary = Pattern.compile("\\b+");

boundary.split(testString), . , , "\ b +" , , , , .

Russ Cox http://swtch.com/~rsc/regexp/ http://www.regular-expressions.info/.

-1

Source: https://habr.com/ru/post/1777832/


All Articles