Why does \ d + not match all digits?

I have the following regex:

REGEX = /^.+(\d+.+(?=AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE|NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|TX|UT|VT|VI|VA|WA|WV|WI|WY)[AZ]{2}[, ]+\d{5}(?:-\d{4})?).+/ 

I have the following line:

 str = "fdsfd 8126 E Bowen AVE Bensalem, PA 19020-1642 dfdf" 

Please note that my capture group starts with one or more digits matching the pattern. But this is what I get:

 str =~ REGEX $1 => "6 E Bowen AVE Bensalem, PA 19020-1642" 

or

 match = str.match(REGEX) match[1] => "6 E Bowen AVE Bensalem, PA 19020-1642" 

Why is he missing the first three digits of 812?

+5
source share
1 answer

Below regex works correctly, as you can see on Regex101

 REGEX = /^.+?(\d+.+(?=AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE|NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|TX|UT|VT|VI|VA|WA|WV|WI|WY)[AZ]{2}[, ]+\d{5}(?:-\d{4})?).+/ 

Note the addition of a question mark at the beginning of the regular expression

 /^.+?(\d+... ^ 

By default, your first .+ Is greedy, consuming all the digits it can, and still allowing the regex to go through. Adding ? after the plus, you can make it lazy, not greedy.

An alternative would be to not write down the numbers, for example:

 /^[^\d]+(\d+... 

[^\d]+ will capture everything except numbers.

+7
source

Source: https://habr.com/ru/post/1275895/


All Articles