Java Regex: replace character if only another character precedes

I use Java and regular expressions and have to split some data into several objects. In my input, a single quote character (') indicates the end of a UNLESS entity, preceded by an escape character, which is a question mark (?).

My RegEx (?<!\\?)\\' and I use a scanner to split the input into separate objects. Thus, the following cases work correctly:

 Hello'There becomes 2 entities: Hello and There Hello?'There remains 1 entity: Hello?'There 

However, when I come across a case where I want to avoid a question mark, it does not work. So:

 Hello??'There should become 2 entities: Hello?? and There Hello???'There should become 1 entity: Hello???'There Hello????'There should become 2 entities: Hello???? and There Hello?????'There should become 1 entity: Hello????'There Hello?????There should become 1 entity: Hello????There Hello??????There should become 1 entity: Hello?????There 

Thus, the rule is that there is an even number of question marks, followed by a quote, it should be divided. If there is an odd number of question marks, then it should not be divided.

Can someone help fix my Regex (hopefully with an explanation!) To handle a few cases?

Thanks,

Phil

+4
source share
3 answers

Do not use split() for this. This seems like an obvious solution, but it's much easier to match the objects themselves than match the delimiters. Most regular expression languages ​​have built-in methods for this, such as Python findall() or Ruby scan() , but in Java we still stick to writing a pattern. Here is an example:

 Pattern p = Pattern.compile("([^?']|\\?.)+"); String[] inputs = { "Hello??'There", "Hello???'There", "Hello????'There", "Hello?????'There", "Hello?????There", "Hello??????There" }; for (String s : inputs) { System.out.printf("%n%s :%n", s); Matcher m = p.matcher(s); while (m.find()) { System.out.printf(" %s%n", m.group()); } } 

output:

 Hello??'There : Hello?? There Hello???'There : Hello???'There Hello????'There : Hello???? There Hello?????'There : Hello?????'There Hello?????There : Hello?????There Hello??????There : Hello??????There 

In addition to being a hideous hack (unnecessarily, Thomas!), Thomas used the arbitrary max-length skill, because it continues to introduce errors into the Pattern.java code that processes this material. But do not think of this decision as another workaround; lookbehind should never be your first resort, even in flavors like .NET, where they work reliably and without restriction.

+2
source

Try this expression to match even cases: (?<=[^\?](?>\?\?){0,1000})'

  • (?<=...)' is a positive appearance, i.e. each ' preceded by an expression between (?<= and ) will match
  • (?>\?\?) is an atomic group of two consecutive question marks
  • (?>\?\?){0,1000} means that there can be from 0 to 1000 from these groups. Note that you cannot write (?>\?\?)* , Since the expression must have the maximum length (maximum number of groups). However, you should be able to increase the upper bound by a lot, depending on the rest of the expression
  • [^\?](?>\?\?)... means that groups of two question marks must be preceded by some character, but not by a question mark (otherwise you would correspond to an odd case)
+3
source

Are you sure you want to use regular expressions? If your string is relatively small and / or the runtime is not a big problem, you can use String Builder and a loop to count the number "?" eg.

  //Your String String x = "Hello??'World'Hello?'World"; StringBuilder sb = new StringBuilder(); //Holds your splits ArrayList<String> parts = new ArrayList<String>(); int questionmarkcount = 0; int _isEven; for (char c : x.toCharArray()) { if (c == '?') { questionmarkcount++; sb.append(c); } else if (c == '\'') { _isEven = questionmarkcount % 2; //if there are an even number of '? or none if (_isEven == 0 || questionmarkcount == 0) { //add the current split, reset the ? count and clear the String builder parts.add(sb.toString()); sb.delete(0, sb.length()); questionmarkcount = 0; } else { //append the question mark, no split is needed sb.append(c); //start counting from the beginning questionmarkcount = 0; } } else { sb.append(c); } } parts.add(sb.toString()); 

By the end of the loop, the parts of the ArrayList will contain all of your splits. The current code will be broken if there is a number of EVEN question marks preceding '.

0
source

Source: https://habr.com/ru/post/1432931/


All Articles