A few nested matches in JavaScript Regular Expression

Trying to write a regular expression to match GS1 barcode patterns ( https://en.wikipedia.org/wiki/GS1-128 ) that contain 2 or more of these patterns that have an identifier followed by a certain number of data characters.

I need something that matches this barcode because it contains 2 identifiers and data patterns:

readable by a person with identifiers in parens: (01) 12345678901234 (17) 501200

actual data: 011234567890123417501200

but it should not match this barcode if it has only one template:

human readable: (01) 12345678901234

actual data: 0112345678901234

It seems the following should work:

 var regex = /(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6})){2,}/g; var str = "011234567890123417501200"; console.log(str.replace(regex, "$4")); // matches 501200 console.log(str.replace(regex, "$1")); // no match? why? 

For some strange reason, as soon as I delete {2,} , it works, but I need {2,} so that it returns only matches if there is more than one match.

 // Remove {2,} and it will return the first match var regex = /(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6}))/g; var str = "011234567890123417501200"; console.log(str.replace(regex, "$4")); // matches 501200 console.log(str.replace(regex, "$1")); // matches 12345678901234 // but then the problem is it would also match single identifiers such as var str2 = "0112345678901234"; console.log(str2.replace(regex, "$1")); 

How do I make this work fit and pull data only if there is more than one set of matching groups?

Thanks!

+6
source share
3 answers

Your RegEx is logically and syntactically correct for Perl compatible regular expressions (PCRE). The problem, in my opinion, you are faced with the fact that JavaScript has problems with repeating capture groups. That's why RegEx works great when you take out {2,} . By adding a quantifier, JavaScript will only return the last match.

I would recommend removing the {2,} quantum, and then programmatically checking for matches. I know this is not ideal for those who are a big fan of RegEx, but c'est la vie.

Below is a snippet below:

 var regex = /(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6}))/g; var str = "011234567890123417501200"; // Check to see if we have at least 2 matches. var m = str.match(regex); console.log("Matches list: " + JSON.stringify(m)); if (m.length < 2) { console.log("We only received " + m.length + " matches."); } else { console.log("We received " + m.length + " matches."); console.log("We have achieved the minimum!"); } // If we exec the regex, what would we get? console.log("** Method 1 **"); var n; while (n = regex.exec(str)) { console.log(JSON.stringify(n)); } // That not going to work. Let try using a second regex. console.log("** Method 2 **"); var regex2 = /^(\d{2})(\d{6,})$/; var arr = []; var obj = {}; for (var i = 0, len = m.length; i < len; i++) { arr = m[i].match(regex2); obj[arr[1]] = arr[2]; } console.log(JSON.stringify(obj)); // EOF 

Hope this helps.

+2
source

The reason is that capture groups give only the last match of that particular group. Imagine you have two barcodes in your sequence that have the same identifier 01 ... now it becomes clear that $1 cannot reference both at the same time. The capture group retains only the second occurrence.

A simple way, but not so elegant, is to drop {2,} , and instead repeat the entire regular expression pattern to match the second barcode sequence. I think you also need to use ^ (start of line binding) to make sure the match is at the beginning of the line, otherwise you might get the identifier of a half-invalid sequence. After reusing the regex pattern, you should also add .* If you want to ignore everything that follows the second sequence and not return to it when using replace .

Finally, since you do not know which identifier will be found for the first and second matches, you need to reproduce $1$2$3$4 in replace , knowing that only one of these four will be a non-empty string. The same for the second match: $5$6$7$8 .

Here is the improved code that applies to your example line:

 var regex = /^(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6}))(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6})).*/; var str = "011234567890123417501200"; console.log(str.replace(regex, "$1$2$3$4")); // 12345678901234 console.log(str.replace(regex, "$5$6$7$8")); // 501200 

If you also need to match the barcodes that follow the second, then you cannot avoid loop recording. You cannot do this with a simple replace based expression.

With a loop

If the loop is allowed, you can use the regex#exec method. I would suggest adding a kind of "catch all" to your regular expression that matches one character if none of the other identifiers matches. If in the loop you find such a catch-all match, you exit:

 var str = "011234567890123417501200"; var regex = /(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6})|(.))/g; // 1: ^^^^^^ 2: ^^^^^^^^^^^^^ 3: ^^^^^ 4: ^^^^^ 5:^ (=failure) var result = [], grp; while ((grp = regex.exec(str)) && !grp[5]) result.push(grp.slice(1).join('')); // Consider it a failure when not at least 2 matched. if (result.length < 2) result = []; console.log(result); 
+1
source

Update

1st example

example with $ 1 $ 2 $ 3 $ 4 I don’t know why in the matrix :)

but you see $ 1 β†’ abc $ 2 β†’ def $ 3 β†’ ghi $ 4 β†’ jkl

 // $1 $2 $3 $4 var regex = /(abc)|(def)|(ghi)|(jkl)/g; var str = "abcdefghijkl"; // test console.log(str.replace(regex, "$1 1st ")); console.log(str.replace(regex, "$2 2nd ")); console.log(str.replace(regex, "$3 3rd ")); console.log(str.replace(regex, "$4 4th ")); 

Second example

sth is mixed with error here

 // $1 $2 $3 $4 var regex = /((abc)|(def)|(ghi)|(jkl)){2,}/g; var str = "abcdefghijkl"; // test console.log(str.replace(regex, "$1 1st ")); console.log(str.replace(regex, "$2 2nd ")); console.log(str.replace(regex, "$3 3rd ")); console.log(str.replace(regex, "$4 4th ")); 

As you can see, instead of ($1)( )( )( ) there is ($4)( )( )( ) .

If I think the problem is with external brackets () obfuscating the "pseudo" $ 1 is $ 4. If there is a pattern in the outer brackets () , then {2,} , so in the outer brackets () it is 4 dollars, but in the subpattern there is (?:01(\d{14})) , but it reads as not 1 , but erroneously in this case $ 4. Perhaps this causes conflicts between the stored values ​​in the outer brackets () and the 1st stored values, but inside the brackets (this is $ 1) . That is why it is not displayed. In other words, you have ($ 4 ($ 1 $ 2 $ 3 $ 4)), and this is not true.

I am adding a picture to show what I mean.

enter image description here

As @Damian said

By adding a quantifier, JavaScript will only return the last match.

so the last match is $ 4.

final update

I added a useful little test.

 var regex = /(?:01(\d{14})|10(\x1D{6,20})|11(\d{6})|17(\d{6})){2,}/g; var str = "011234567890123417501200"; // test console.log(str.replace(regex, "$1 1st ")); console.log(str.replace(regex, "$2 2nd ")); console.log(str.replace(regex, "$3 3rd ")); console.log(str.replace(regex, "$4 4th ")); 
0
source

Source: https://habr.com/ru/post/1015050/


All Articles