Pattern Matcher Vs String Split which should I use?

Question

Pattern Matcher Vs String Split which should I use?

First time post.

First, I know how to use both the Matcher and String Split patterns. My questions that are best used in my example and why? Or suggestions for better alternatives.

Task: I need to extract an unknown NOUN between two known regular expressions in an unknown string.

My solution: get the beginning and end of the noun (from Regexp 1 and 2) and the substring to extract the noun.

String line = "unknownXoooXNOUNXccccccXunknown"; int goal = 12 ; String regexp1 = "Xo+X"; String regexp2 = "Xc+X";

I need to find the index position AFTER the first regular expression.
I need to find the index position before the second regular expression.

A) I can use a template template

  Pattern p = Pattern.compile(regexp1); Matcher m = p.matcher(line); if (m.find()) { int afterRegex1 = m.end(); } else { throw new IllegalArgumentException(); //TODO Exception Management; }

B) I can use String Split

  String[] split = line.split(regex1,2); if (split.length != 2) { throw new UnsupportedOperationException(); //TODO Exception Management; } int afterRegex1 = line.indexOf(split[1]);

Which approach should I use and why? I do not know what is more effective in time and in memory. Both are close enough to what is readable to me.

+6

java performance string split regex

Another Compiler Error Oct 16 '13 at 17:19

source share

4 answers

You should use String.split() to read if you are not in a tight loop.

Per split() javadoc , split() makes the equivalent of Pattern.compile() , which you can optimize if you are in a tight loop.

+3

willkil Oct 16 '13 at 17:31

source share

It looks like you want a unique event. Just for that

 input.replaceAll(".*Xo+X(.*)Xc+X.*", "$1")

For efficiency, use Pattern.matcher(input).replaceAll .

If you enter lines, use Pattern.DOTALL or the s modifier.

If you want to use split, consider using Guava Splitter . It behaves more intelligently and also adopts a Pattern that is good for speed.

+2

maaartinus Oct 16 '13 at 17:36

source share

If you really need locations, you can do it like this:

 String line = "unknownXoooXNOUNXccccccXunknown"; String regexp1 = "Xo+X"; String regexp2 = "Xc+X"; Matcher m=Pattern.compile(regexp1).matcher(line); if(m.find()) { int start=m.end(); if(m.usePattern(Pattern.compile(regexp2)).find()) { final int end = m.start(); System.out.println("from "+start+" to "+end+" is "+line.substring(start, end)); } }

But if you just need a word between them, I recommend, as Ian MacLaird has shown.

0

Holger Oct 16 '13 at 17:38

source share

Ian mclaird · Accepted Answer · 2013-10-16T17:34:06+0000

I would do it like this:

 String line = "unknownXoooXNOUNXccccccXunknown"; String regex = "Xo+X(.*?)Xc+X"; Pattern p = Pattern.compile(regex); Matcher m = p.matcher(line); if (m.find()) { String noun = m.group(1); }

(.*?) used to make internal NOUN reluctance. This protects us from the case when our final pattern appears again in an unknown part of the string.

EDIT

This works because (.*?) Defines a capture group. There, only one such group is defined in the template, so it gets index 1 (parameter m.group(1) ). These groups are indexed from left to right, starting with 1. If the template was defined as follows:

 String regex = "(Xo+X)(.*?)(Xc+X)";

Then there would be three capture groups, such that

 m.group(1); // yields "XoooX" m.group(2); // yields "NOUN" m.group(3); // yields "XccccccX"

There is group 0, but it matches the whole pattern and is equivalent to this

 m.group(); // yields "XoooXNOUNXccccccX"

For more information on what you can do with Matcher , including ways to get the start and end position of your template in the source string, see Matcher JavaDocs

Pattern Matcher Vs String Split which should I use?

More articles: