Match multiple occurrences or zero (in that order) using regular expressions

I want to match Roman numbers using Groovy regular expressions (I have not tried this in Java, but should be the same). I found an answer on this website in which someone suggested the following regex:

/M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})/

The problem is that an expression like this /V?I{0,3}/is not greedy in Groovy. So, for a string like "Book Number VII", a match /V?I{0,3}/returns "V" rather than "VII", as would be desirable.

Obviously, if we use the template /VI+/, then we get the match "VII" ... but this solution is not valid if the line is something like "Book number V", since we will not get matches ...

I tried to force the capture of the maximum character using a greedy quantifier /VI{0,3}+/or even /VI*+/, but I still get a match of "V" over "VII"

Any ideas?

+3
source share
2 answers

Why not just (IX | IV | V? I {1,3} | V)?

0
source

I found a mistake. The fact is that patterns, such as /V?I{0,3}/or /V?I*/, are even found with EMPTY strings ... so for a string like "Book VII" it will match the following results:

Result[0] --> ''
Result[1] --> '' 
Result[2] --> ''
Result[3] --> '' 
Result[4] --> '' 
Result[5] --> 'VII'
Result[6] --> '' 

( [5]) . , (Result [0]), , .

, /V?I{1,3}|V/ , Ok:

Result[0] --> 'VII'

... , .

, .

0

Source: https://habr.com/ru/post/1770118/


All Articles