I need to process a comma-separated string that contains triplets of values ββand translates them into run-time types, the input looks like this:
"1x2y3z,80r160g255b,48h30m50s,1x3z,255b,1h,..."
Therefore, each substring must be converted as follows:
"1x2y3z" should become Vector3 with x = 1, y = 2, z = 3 "80r160g255b" should become Color with r = 80, g = 160, b = 255 "48h30m50s" should become Time with h = 48, m = 30, s = 50
The problem I am facing is that all components are optional (but they keep order), so the following lines are also valid Vector3 , Color and Time values:
"1x3z" Vector3 x = 1, y = 0, z = 3 "255b" Color r = 0, g = 0, b = 255 "1h" Time h = 1, m = 0, s = 0
What have i tried so far?
All optional components
((?:\d+A)?(?:\d+B)?(?:\d+C)?)
A , B and C are replaced with the correct letter for each case, the expression works almost well, but it gives twice the expected results (one match for the string and another match for the empty string immediately after the first match), for example:
"1h1m1s" two matches [1]: "1h1m1s" [2]: "" "11x50z" two matches [1]: "11x50z" [2]: "" "11111h" two matches [1]: "11111h" [2]: ""
This is not unexpected ... because an empty string matches an expression when ALL components are empty; therefore, to fix this problem, I tried the following:
1 to 3 quantifiers
((?:\d+[ABC]){1,3})
But now the expression matches strings with the wrong order or even repeating components !:
"1s1m1h" one match, should not match at all! (wrong order) "11z50z" one match, should not match at all! (repeated components) "1r1r1b" one match, should not match at all! (repeated components)
As for my last attempt, I tried this version of my first expression:
Match from start ^ to end $
^((?:\d+A)?(?:\d+B)?(?:\d+C)?)$
And it works better than the first version, but it still matches an empty string plus I must first tokenize the input and then pass each token to the expression to ensure that the test string can match the beginning ( ^ ). and end ( $ ).
After reading and (trying) an understanding of the concept of regular expression and using Answer by Casimir et Hippolyte I tried the suggested expression:
\b(?=[^,])(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b
Against the following test line:
"48h30m50s,1h,1h1m1s,11111h,1s1m1h,1h1h1h,1s,1m,1443s,adfank,12322134445688,48h"
And the results were amazing! it can detect complete real matches flawlessly (other expressions gave me 3 matches on "1s1m1h" or "1h1h1h" that were not meant to match at all). Unfortunately, it captures emtpy matches every time an inconsistent match is found, so "" found immediately before "1s1m1h" , "1h1h1h" , "adfank" and "12322134445688" , so I changed the Lookahead condition to get the following expression:
\b(?=(?:\d+[ABC]){1,3})(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b
It gets rid of empty matches on any line that does not match (?:\d+[ABC]){1,3}) , so empty matches before "adfank" and "12322134445688" disappear, but those before "1s1m1h" , "1h1h1h" are stil detected.
So, the question arises: is there any regular expression that corresponds to three triplet values ββin this order, where all components are optional, but must consist of at least one component and not correspond to empty lines?
The regex tool I'm using is the C ++ 11 one.