Why does string.replace (/ \ W * / g, '_') add all characters?

I studied regexp in js, ran into a situation that I did not understand.

I tested the replace function with the following regexp:

/\W*/g 

And he expected that he would add the beginning of the line and begin to replace all characters other than words.

 The Number is (123)(234) 

will become:

 _The_Number_is__123___234_ 

This will be adding a line because it has at least zero instances and then replaces all non-breaking spaces and characters other than words.

Instead, he added each character and replaced all characters other than the word.

 _T_h_e__N_u_m_b_e_r__i_s__1_2_3__2_3_4__ 

Why did he do this?

+5
source share
6 answers

The string is parsed by the JS regular expression engine as a sequence of characters and the locations between them. See the following diagram for hyphenated locations:

  -The- -Number- -is- -(-1-2-3-)-(-2-3-4-)- ||| | ||Location between T and h, etc. ............. | |1st symbol | start -> end 

All of these positions can be analyzed and matched with a regular expression.

Since /\W*/g is a regular expression matching all non-overlapping occurrences (due to the g modifier) ​​of 0 or more (due to * quantifier) ​​characters other than the word, all positions in front of the word characters correspond . Between T and h there is a place checked using a regular expression, and since there is no word "w980" ( h is the word char), an empty match is returned (like \W* can match an empty string).

So, you need to replace the beginning of the line and each non-word char with _ . The naive approach is to use .replace(/\W|^/g, '_') . However, there is a caveat: if a line starts with a character other than a word, _ will not be added at the beginning of the line:

 console.log("Hi there.".replace(/\W|^/g, '_')); // _Hi_there_ console.log(" Hi there.".replace(/\W|^/g, '_')); // _Hi_there_ 

Note that here \W comes first in alternation and “wins” when it matches at the beginning of the line: a space is matched, and then no start position is found at the next iteration of the match.

Now you might think that you can combine with /^|\W/g . Look here:

 console.log("Hi there.".replace(/^|\W/g, '_')); // _Hi_there_ console.log(" Hi there.".replace(/^|\W/g, '_')); // _ Hi_there_ 

The second result, _ Hi_there_ shows how the JS regular expression engine handles zero-width matches during the replace operation: after a zero-width match is found (here this is the position at the beginning of the line), the replacement occurs, and the RegExp.lastIndex property RegExp.lastIndex increased, thus going to the position after the first character! This is why the first space is saved and no longer matches \W

The solution is to use a consumption pattern that will not allow a zero width match:

 console.log("Hi there.".replace(/^(\W?)|\W/g, function($0,$1) { return $1 ? "__" : "_"; })); console.log(" Hi there.".replace(/^(\W?)|\W/g, function($0,$1) { return $1 ? "__" : "_"; })); 
+5
source

The problem is the value of \W* . This means "0 or more characters without words." This means that the empty string "" will match, given that it really is 0 characters other than the word.

So, the regular expression matches every character in the string and at the end, so all replacements are done.

You want either /\W/g (replacing each individual character without words) or /\W+/g (replacing each set of consecutive characters without words).

 "The Number is (123)(234)".replace(/\W/g, '_') // "The_Number_is__123__234_" "The Number is (123)(234)".replace(/\W+/g, '_') // "The_Number_is_123_234_" 
+8
source

You can use RegExp /(^\W*){1}|\W(?!=\w)/g to match one \W at the beginning of the line or \W followed by \W

 var str = "The Number is (123)(234)"; var res = str.replace(/(^\W*){1}|\W(?!=\w)/g, "_"); console.log(res); 
+1
source

Instead, you should have used / \ W + / g.

"*" means all characters in themselves.

0
source

This is because you are using the * operator. This matches zero or more characters. Thus, between each match. If you replace the expression with /\W+/g , it will work as you expected.

0
source

This should work for you.

Find: (?=.)(?:^\W|\W$|\W|^|(.)$)
Replace: $1_

Explained cases:

  (?= . ) # Must be at least 1 char (?: # Ordered Cases: ^ \W # BOS + non-word (consumes bos) | \W $ # Non-word + EOS (consumes eos) | \W # Non-word | ^ # BOS | ( . ) # (1), Any char + EOS $ ) 

Please note that this could be done without lookahead through
(?:^\W|\W$|\W|^$)

But this will introduce a single _ into an empty string.
Thus, it becomes more complex.
Overall, however, this is a simple replacement.
Unlike Stribnez's solution, no callback logic is required

on the replacement side.

0
source

Source: https://habr.com/ru/post/1265009/


All Articles