Some regex patterns break the regex javascript engine

I wrote the following regular expression: /\D(?!.*\D)|^-?|\d+/g

I think it should work as follows:

 \D(?!.*\D) # match the last non-digit | # or ^-? # match the start of the string with optional literal '-' character | # or \d+ # match digits 

But this is not so:

 var arrTest = '12,345,678.90'.match(/\D(?!.*\D)|^-?|\d+/g); console.log(arrTest); var test = arrTest.join('').replace(/[^\d-]/, '.'); console.log(test); 

However, when you play with PCRE(php) -flavor online at Regex101 . It works as I described.

I don’t know, I think that it should work the way it does not work. Or if there is some model in javascript-regex-flavor that is not allowed.

+1
source share
2 answers

JS works differently than PCRE. The fact is that the JJ regex mechanism copes well with zero matches, the index simply increases manually, and the next character is skipped after a zero-length match. ^-? can match an empty string and match the start of 12,345,678.90 , skipping 1 .

If we look at the String#match documentation , we will see that each match call with a global regular expression increments the regex object lastIndex after searching for zero length:

  1. Else, global true
    a. Call the internal [[Put]] method for rx with arguments lastIndex "and 0.
    b. Let A be a new array created as it were by the expression new Array () , where Array is the standard built-in constructor with this name.
    from. Let previousLastIndex be 0.
    e. Let n be 0.
    e. Let lastMatch be true .
    e. Repeat while lastMatch true
    I am. Let the result be called by the [[Call]] exec internal method with rx as this value and an argument containing S.
    II. If the result is null , set lastMatch to false .
    III. Else, the result is not null
    1. Let thisIndex be the result of calling the [[Get]] rx internal method with the argument " lastIndex ".
    2. If thisIndex = previousLastIndex, then
    a. Call the internal [[Put]] method for rx with the arguments " lastIndex " and thisIndex + 1.
    b. Set previousLastIndex to thisIndex + 1.

So, the matching process starts from 8a to 8f, initializing the auxiliary structures, then the while block is introduced (repeated until lastMatch is true ), the internal exec command corresponds to the empty space at the beginning of the line (8fi → 8fiii), and as a result, not null , thisIndex is set to lastIndex of the previous successful match, and since the match is of zero length (basically thisIndex = previousLastIndex), the previousLastIndex parameter is set to thisIndex + 1 - , which skips the current position after a successful zero-length match .

In fact, you can use the simpler regular expression inside the replace method and use the callback to use the appropriate replacements:

 var res = '-12,345,678.90'.replace(/(\D)(?!.*\D)|^-|\D/g, function($0,$1) { return $1 ? "." : ""; }); console.log(res); 

Template Details :

  • (\D)(?!.*\D) is not a digit (captured in group 1), which is not followed by 0+ characters other than the new line, but another insignificant
  • | - or
  • ^- - hyphen when starting a line
  • | - or
  • \D is not a number

Note that here you don’t even have to make a hyphen optional at the beginning.

+3
source

You can reorder the rotation patterns and use this in JS to make it work:

 var arrTest = '12,345,678.90'.match(/\D(?!.*\D)|\d+|^-?/g); console.log(arrTest); var test = arrTest.join('').replace(/\D/, '.'); console.log(test); //=> 12345678.90 

RegEx Demo

This is the difference between Javascript regex and PHP (PCRE).

In Javascript:

 '12345'.match(/^|.+/gm) //=> ["", "2345"] 

In PHP:

 preg_match_all('/^|.+/m', '12345', $m); print_r($m); Array ( [0] => Array ( [0] => [1] => 12345 ) ) 

So when you match ^ in Javascript, the regex engine moves one position forward, and something after alternating | coincides with the 2nd omwards entry position.

+2
source

Source: https://habr.com/ru/post/1012451/


All Articles