Regex for date omits one month symbol

This is very strange because it is a very simple expression for the dd/mm format. The result should be: "Group 1: 14; Group 2: 12" , but this is "Group 1: 14; Group 2: 1" .

The second group only captured the first character, but did not indicate the second ("2" in the example).

 String sDay = "(?:0?[1-9]|[12][0-9]|3[01])"; String sMonth = "(?:0?[1-9]|1[0-2])"; String sDot = "[\\.]"; String sSlash = "[/]"; String sMinus = "[\\-]"; String sSeparators = (sDot + "|" + sSlash + "|" + sMinus); Pattern reDayMonth = Pattern.compile("(" + sDay + ")" + "(?:" + sSeparators + ")" + "(" + sMonth+ ")"); String s = "14/12"; Matcher reMatcher = reDayMonth.matcher(s); boolean found = reMatcher.find(); System.out.println("Group 1: " + reMatcher.group(1) + "; Group 2: " + reMatcher.group(2)); 

I do not understand why. Could you help me?

+4
source share
1 answer

In your regular expression of the month, you allow the unique match of the first and therefore it (and then stops). Try to transfer the required two-digit month to check first, and then one digit:

 (?:0?[1-9]|1[0-2]) 

should become:

 (?:1[0-2]|0?[1-9]) 

UPDATE (reasoning)
Reason the same pattern leading from 0? , in the day pattern, it works, but not in the month pattern, because you indicate that there are characters that must follow the day pattern, therefore, the whole pattern for day processed. However, no characters are specified in the month pattern; therefore, it stops when searching for the first match, which in the original template was a single digit.

If you changed the input format (i.e., you used mm/dd instead of dd/mm ) and just swapped sDay and sMonth in the compiled regular expression, you will actually notice that month matches two numbers correctly, and day will fail instead of this!

One way to solve the problem is to first combine the two-character rule, and then the additional one-character rule, like my answer. An alternative method assumes / requires that your input date be on one line (i.e., the date starts at the beginning of the line and ends at the end of the line without any other text). If so, you can use regex ^ and $ to match the beginning and end of the line, respectively:

 Pattern.compile("^(" + sDay + ")" + "(?:" + sSeparators + ")" + "(" + sMonth+ ")$"); 

By doing this, he will fully evaluate each pattern to find a complete match, in which case you should always match the correct month / day.

SITE NOTE (suggestion, not response, though)
For a useful comment / suggestion from @MarkoTopolnik you do not need to use a non-capture group in each group (months + days), especially since you immediately wrap them in a capture group, which makes an unattractive group useless. So the above template could just become:

 1[0-2]|0?[1-9] 
+3
source

Source: https://habr.com/ru/post/1442044/


All Articles