Java Regex validates previous char before splitting

I have a line like this

This: string: should ~: be: split: when: previous: char: is: not ~: this

I need to split the line with the separator ":", but only if the character before the separator is NOT "~"

Now I have the following regular expression:

String[] split = str.split(":(?<!~:)"); 

This works, but since I only reached trial and error, I’m not sure if this is the most efficient way to do this. In addition, this function will be called repeatedly on large lines, so performance is taken into account. What is a more efficient way to do this?

+4
source share
3 answers

Update: To make this fairer, I wanted to use a compiled template and see its results. So I updated the code to use a compiled template, an uncompiled template, and my own method.

As long as this is not used with a regular expression, it is faster than a given regular expression.

 public static void main(String[] args) { Pattern pattern = Pattern.compile(":(?<!~:)"); for (int runs = 0; runs < 4; ++runs) { long start = System.currentTimeMillis(); for (int index = 0; index < 100000; ++index) { "This:string:must~:be:split:when:previous:char:is:not~:this".split(":(?<!~:)"); } long stop = System.currentTimeMillis(); System.out.println("Run: " + runs + " Regex: " + (stop - start)); start = System.currentTimeMillis(); for (int index = 0; index < 100000; ++index) { pattern.split("This:string:must~:be:split:when:previous:char:is:not~:this"); } stop = System.currentTimeMillis(); System.out.println("Run: " + runs + " Compiled Regex: " + (stop - start)); start = System.currentTimeMillis(); for (int index = 0; index < 100000; ++index) { specialSplit("This:string:must~:be:split:when:previous:char:is:not~:this"); } stop = System.currentTimeMillis(); System.out.println("Run: " + runs + " Custom: " + (stop - start)); } for (String s : specialSplit("This:string:must~:be:split:when:previous:char:is:not~:this")) { System.out.println(s); } } public static String[] specialSplit(String text) { List<String> stringsAfterSplit = new ArrayList<String>(); StringBuilder splitString = new StringBuilder(); char previousChar = 0; for (int index = 0; index < text.length(); ++index) { char charAtIndex = text.charAt(index); if (charAtIndex == ':' && previousChar != '~') { stringsAfterSplit.add(splitString.toString()); splitString.delete(0, splitString.length()); } else { splitString.append(charAtIndex); } previousChar = charAtIndex; } if (splitString.length() > 0) { stringsAfterSplit.add(splitString.toString()); } return stringsAfterSplit.toArray(new String[stringsAfterSplit.size()]); } 

Output

 Run: 0 Regex: 468 Run: 0 Compiled Regex: 365 Run: 0 Custom: 169 Run: 1 Regex: 437 Run: 1 Compiled Regex: 363 Run: 1 Custom: 166 Run: 2 Regex: 445 Run: 2 Compiled Regex: 363 Run: 2 Custom: 167 Run: 3 Regex: 436 Run: 3 Compiled Regex: 361 Run: 3 Custom: 167 This string must~:be split when previous char is not~:this 
+2
source

A slightly simpler approach is as follows:

 (?<!~): 

This way you will not match : twice. I doubt that you will see any difference in actions. It is also very simple to write without a regular expression, just look for the next colon and check the tilde in front of it.

+5
source

Try it. [^~]:

Tested in JS

0
source

Source: https://habr.com/ru/post/1335028/


All Articles