Priority Regular Expression Manipulation

I am writing some Java code to split a string into an array of strings. First, I split this line using the regex pattern "\\,\\,|\\," and then split using the pattern "\\,|\\,\\," . Why is there a difference between the output of the first and the output of the second?

 public class Test2 { public static void main(String[] args){ String regex1 = "\\,\\,|\\,"; String regex2 = "\\,|\\,\\,"; String a = "20140608,FT141590Z0LL,0608103611018634TCKJ3301000000018667,3000054789,IDR1742630000001,80507,1000,6012,TCKJ3301,6.00E+12,ID0010015,WADORI PURWANTO,,3000054789"; String ss[] = a.split(regex1); int index = 0; for(String m : ss){ System.out.println((index++)+ ": "+m+"|"); } } } 

Output when using regex1 :

 0: 20140608| 1: FT141590Z0LL| 2: 0608103611018634TCKJ3301000000018667| 3: 3000054789| 4: IDR1742630000001| 5: 80507| 6: 1000| 7: 6012| 8: TCKJ3301| 9: 6.00E+12| 10: ID0010015| 11: WADORI PURWANTO| 12: 3000054789| 

And when using regex2 :

 0: 20140608| 1: FT141590Z0LL| 2: 0608103611018634TCKJ3301000000018667| 3: 3000054789| 4: IDR1742630000001| 5: 80507| 6: 1000| 7: 6012| 8: TCKJ3301| 9: 6.00E+12| 10: ID0010015| 11: WADORI PURWANTO| 12: | 13: 3000054789| 

I need some explanation of how the regex engine works when dealing with this situation.

0
source share
4 answers

How a regular expression works: a state machine always reads from left to right. ,|,, == , since it will always correspond only to the first rotation:

img
(source: gyazo.com )

,,|, == ,,? :

x
(source: gyazo.com )


However, you should use ,,? instead so that there is no going back:

r
(source: gyazo.com )

+4
source

Having seen two results, it seems that the split method first tries to find the first expression ("," for regex2, "," for regex1) and split the line, and then the second, but after the first pass with regex2, there is not a single "," in the line . This is why there is an empty line detected when reading "," with regex2.

So, in order for your regular expression to be useful, you must first write a more complex expression.

+1
source

It will be evaluated from left to right. In regex1 , \\,\\, checked first, otherwise \\, checked. This is why the 12th line is not empty, because \\,\\, matches in this case. For regex2 everything is matched using \\, , so there is an empty string.

+1
source

Case 1: Divide by ,, else ,
This gets only the first case, the rest are divided into,.

Case 2: Divide by , else ,,
gets all the cases. Therefore ,, is divided into word and ,word .
Then ,word is divided into "" and word .

+1
source

Source: https://habr.com/ru/post/1201175/


All Articles