Regex matches in C # but not in java

I have the following regex (long, I know):

(?-mix:((?-mix:(?-mix:\{\%).*?(?-mix:\%\})|(?-mix:\{\{).*?(?-mix:\}\}?)) |(?-mix:\{\{|\{\%))) 

which I use to split the string. It correctly matches with C #, but when I moved the code to Java, it does not match. Is there any feature of this regex that is C # - only?

The source is created as:

 String source = Pattern.quote("{% assign foo = values %}.{{ foo[0] }}."); 

In C #, this is:

 string source = @"{% assign foo = values %}.{{ foo[0] }}."; 

The C # version looks like this:

 string[] split = Regex.split(source, regex); 

In Java, I tried both:

 String[] split = source.split(regex); 

and

 Pattern p = Pattern.compile(regex); String[] split = p.split(source); 
+6
source share
3 answers

Here is an example program with your code: http://ideone.com/hk3uy

There is a big difference between Java and other languages: Java does not add the captured groups as tokens to the result array ( example ). This means that all delimiters are removed from the result, although they will be included in .Net.
The only alternative that I know is not to use split , but to get a list of matches and manually split.

+4
source

I think the problem is how you define source . On my system, this is:

 String source = Pattern.quote("{% assign foo = values %}.{{ foo[0] }}."); 

equivalent to this:

 String source = "\\Q{% assign foo = values %}.{{ foo[0] }}.\\E"; 

(i.e. adds stray \Q and \E ), but the way you define the method, your Java implementation can treat it like an equivalent:

 String source = "\\{% assign foo = values %\\}\\.\\{\\{ foo\\[0\\] \\}\\}\\."; 

(i.e., inserting a large number of backslashes).

Your regex seems beautiful. This program:

 public static void main(final String... args) { final Pattern p = Pattern.compile("(?-mix:((?-mix:(?-mix:\\{\\%).*?(?-mix:\\%\\})|(?-mix:\\{\\{).*?(?-mix:\\}\\}?))|(?-mix:\\{\\{|\\{\\%)))"); for(final String s : p.split("a{%b%}c{{d}}e{%f%}g{{h}}i{{j{%k")) System.out.println(s); } 

prints

 a c e g i j k 

that is, it successfully processes {%b%} , {{d}} , {%f%} , {{h}} , {{ and {% as divided points with all the greed that you expect. But recording the record also works if I draw p all the way to

 Pattern.compile("\\{%.*?%\\}|\\{\\{.*?\\}\\}?|\\{\\{|\\{%"); 

; -)

+2
source

use \\{ instead of \{ and for other characters

0
source

Source: https://habr.com/ru/post/902966/


All Articles