How to write a regex to extract a number from these urls?

I am trying to write a regex to match the numbers in these urls ( 12345678 and 1234567890 ).

 http://www.example.com/p/12345678 http://www.example.com/p/12345678?foo=bar http://www.example.com/p/some-text-123/1234567890?foo=bar 

Rules:

  • numbers always appear after a slash
  • numbers can be of various lengths
  • regex should check that urls have /p/ in them
  • numbers can be at the end of the url, or there can be variables after them

My attempt:

 \/p\/([0-9]+) 

This corresponds to the first and second, but not the third. So I tried:

 \/p\/[^\/?]*\/?([0-9]+) 

There is no joy.

REGEX 101

+5
source share
5 answers

Regex may not be the best tool for this job. It seems that in any case, splitting the URL using the URL parser will make more sense. As you can see from your examples, the numeric part is always the last element in the path of the URL path. I'm not sure which language you use, but many languages ​​offer features that can parse URLs in their component parts.

 $path = parse_url($url, PHP_URL_PATH); if(strpos($path, "/p/") === 0) { $base = basename($path); } else { // error } 

It works every time, assuming $ url is the string you are casting.

+2
source

I have expanded your version, now it works with all examples:

 \/p\/(.+\/)*(\d+)(\?.+=.+(&.+=.+)*)?$ 

If you don't care that the URL is valid, you can compress the regex like this:

 \/p\/(.+\/)*(\d+)($|\?) 

https://regex101.com/r/pW5qB3/2

+1
source

If I understand well, the numbers you want can only be:

  • immediately after the last slash of the URL
  • cannot be part of variables, i.e. /p/123?foo=bar456 matches 123 and
    /p/foobar?foo=bar456 doesn't match anything

Then you can use the following regular expression:

 (?=/p/).*/\K\d+ 

Explanation

 (?=/p/) # lookahead: check '/p/' is in the URL .*/ # go to the last '/' thanks to greediness \K # leave everything we have so far out of the final match \d+ # select the digits just after the last '/' 

To avoid the escape-slash, do not use them as dividers regular expressions : #(?=/p/).*/\K\d+# will work.

See the demo here .

0
source
 \/p\/(?:.*\/)?(\d+)\b 

You can try this. This will capture integers based on your coditons.See demo.Grab capture or group.

https://regex101.com/r/dU7oN5/29

 $re = "/\\/p\\/(?:.*\\/)?(\\d+)\\b/"; $str = "http://www.example.com/p/12345678\nhttp://www.example.com/p/12345678?foo=bar\nhttp://www.example.com/p/some-text-123/1234567890?foo=bar"; preg_match_all($re, $str, $matches); 
0
source
 var regex = new Regex(@"/(?<ticket>\d+)"); var subject = "http://www.example.com/p/some-text-123/1234567890?foo=bar"; var ticket = regex.Match(subject).Groups["ticket"].Value; 

Output: 1234567890

-2
source

Source: https://habr.com/ru/post/1209937/


All Articles