Java Regular Expression for .NET.

I am trying to convert the following regular expression from Java to .NET:

(?i:(?:([^\d,]+?)\W+\b((?:CA|SD|SC|CT|DC)\b)?\W*)?(\d{5}(?:[- ]\d{3,4})?)?) 

When I run a match with the following line:

 Mountain View, CA 94043 

using the Pattern and Matcher object in Java, it populates four groups with values:

  "Mountain View, CA 94043" "Mountain View" "CA" "94043" 

However, there are two matches in .NET. The first match fills four groups with these values:

  "Mountain "(there is a space on the end of group 0) "Mountain" "" "" 

The second match fills three groups with these values:

  "View, CA 94043" "View" "CA" "94043" 

I also tried an expression using RegexBuddy using Java and .NET modes, and also in RegexBuddy, both modes work like a .NET version.

Thanks everyone!

+4
source share
1 answer

Add ^ to the beginning of your pattern and add $ to the end of it to match the beginning and end of the line, respectively. This will make the template match the entire line and give the desired result:

 string input = "Mountain View, CA 94043"; string pattern = @"^(?i:(?:([^\d,]+?)\W+\b((?:CA|SD|SC|CT|DC)\b)?\W*)?(\d{5}(?:[- ]\d{3,4})?)?)$"; Match m = Regex.Match(input, pattern); foreach (Group g in m.Groups) { Console.WriteLine(g.Value); } 

Since you did not limit the pattern to an exact match, as indicated above, it detected partial matches, especially since some of your groups are completely optional. Thus, he considers the β€œmountain” a coincidence, then considers β€œView, CA 94043” in the next match.

EDIT: as pointed out in the comments, I will try to point out the differences between the Java regex and .NET approaches.

In Java, the matches() method returns true / false if the pattern matches the entire string. Thus, it does not require a template change using boundary anchors or statements about atomic zero width. There is no equivalent method in .NET that does this for you. Instead, you need to explicitly add the ^ and $ metacharacters to match the beginning and end of a line or line, respectively, or the \A and \z metacharacters to do the same for the entire string. For help .NET metacharacters, check this MSDN page . I'm not sure which set of anchors uses Java matches() , although \A and \z suggested in this article .

Java matches() returns a boolean, and .NET provides the Regex.IsMatch() method to do the same (except for the difference in matching the entire string already discussed) .. The NET equivalent of the Java find() method is the Regex.Match() , which you can use in a loop to continue searching for the next match. In addition, .NET offers the Regex.Matches() method, which will do this for you, and return a collection of successful matches. This may be fine depending on your needs and input, but for added flexibility, you can check Match.Success in a loop and use the Match.NextMatch() method to continue searching for matches (an example of this is available in the NextMatch link).

+4
source

Source: https://habr.com/ru/post/1380788/


All Articles