The reason is that capture groups give only the last match of that particular group. Imagine you have two barcodes in your sequence that have the same identifier 01
... now it becomes clear that $1
cannot reference both at the same time. The capture group retains only the second occurrence.
A simple way, but not so elegant, is to drop {2,}
, and instead repeat the entire regular expression pattern to match the second barcode sequence. I think you also need to use ^
(start of line binding) to make sure the match is at the beginning of the line, otherwise you might get the identifier of a half-invalid sequence. After reusing the regex pattern, you should also add .*
If you want to ignore everything that follows the second sequence and not return to it when using replace
.
Finally, since you do not know which identifier will be found for the first and second matches, you need to reproduce $1$2$3$4
in replace
, knowing that only one of these four will be a non-empty string. The same for the second match: $5$6$7$8
.
Here is the improved code that applies to your example line:
var regex = /^(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6}))(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6})).*/; var str = "011234567890123417501200"; console.log(str.replace(regex, "$1$2$3$4"));
If you also need to match the barcodes that follow the second, then you cannot avoid loop recording. You cannot do this with a simple replace
based expression.
With a loop
If the loop is allowed, you can use the regex#exec
method. I would suggest adding a kind of "catch all" to your regular expression that matches one character if none of the other identifiers matches. If in the loop you find such a catch-all match, you exit:
var str = "011234567890123417501200"; var regex = /(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6})|(.))/g;
source share