Regex: tune second last value between two slashes of url string

I have a line like this:

http://www.example.com/value/1234/different-value 

How can i extract 1234 ?

Note: there may be a slash at the end:

 http://www.example.com/value/1234/different-value http://www.example.com/value/1234/different-value/ 
+4
source share
4 answers
 /([^/]+)(?=/[^/]+/?$) 

must work. You may need to format it differently according to the language you use. For example, in Ruby, this is

 if subject =~ /\/([^\/]+)(?=\/[^\/]+\/?\Z)/ match = $~[1] else match = "" end 
+4
source

JavaScript:

 var myregexp = /:\/\/.*?\/.*?\/(\d+)/; var match = myregexp.exec(subject); if (match != null) { result = match[1]; } 

It works with your examples ... But I'm sure it wonโ€™t work at all ...

Ruby editing:

 if subject =~ /:\/\/.*?\/.*?\/(.+?)\// match = $~[1] 

It works .

+1
source

Using a fragment for positional extraction

If you always want to extract the 4th element (including the scheme) from the URI and are sure that your data is regular, you can use Array #slice as follows.

 'http://www.example.com/value/1234/different-value'.split('/').slice 4 #=> "1234" 'http://www.example.com/value/1234/different-value/'.split('/').slice 4 #=> "1234" 

This will work reliably whether there is a trailing slash or not, if you have more than 4 elements after the split, and whether this fourth element is always strictly numeric. It works because it is based on the position of the element within the path, and not on the contents of the element. However, you will get nil if you try to parse URIs with fewer elements, such as http://www.example.com/1234/ .

Use Scan / Match to retrieve a pattern

Alternatively, if you know that the element you are looking for is always the only one consisting solely of numbers, you can use String # match with look-arounds to extract only the numeric part of the string.

 'http://www.example.com/value/1234/different-value'.match %r{(?<=/)\d+(?=/)} #=> #<MatchData "1234"> $& #=> "1234" 

To bind an expression to a path, look-behind and look-ahead statements are required. Without them, you will also find things like w3.example.com . This solution is the best approach if the position of the target element can change, and if you can guarantee that your element of interest will be the only one that matches the anchor regular expression.

If there are multiple matches (e.g. http://www.example.com/1234/5678/ ), you can use String # scan instead to select the first or last match. This is one of those things that โ€œknow your dataโ€; if you have irregular data, regular expressions are not always the best choice.

0
source

I think this is a little simpler than the accepted answer, because it does not use any positive results ( ?= ), But just makes the last slash optional with a character ? :

 ^.+\/(.+)\/.+\/?$ 

In Ruby:

 STDIN.read.split("\n").each do |nextline| if nextline =~ /^.+\/(.+)\/.+\/?$/ printf("matched %s in %s\n", $~[1], nextline); else puts "no match" end end 

Live demo


Let me break what happens:

  • ^ : start of line
  • .+\/ : match anything (eagerly) to a slash
    • Since we are going to combine at least 1, no more than 2 slashes later, this slash will be either the second last slash (as in http://www.example.com/value/1234/different-value ) or the third last slash as in ( http://www.example.com/value/1234/different-value/ )
    • Up to this point we have matched http://www.example.com/value/ (due to greed)
  • (.+)\/ : our capture group for 1234 , indicated by a bracket. This is all followed by another slash.
    • Since the previous match matches the second or third last slash, this will match the last slash or second slash, respectively
  • .+ : match anything. This would be after our 1234 , so we assume that after 1234/ ( different-value ) there are characters
  • \/? : optionally matches another slash (slash after different-value )
  • $ : match end of line

Please note that there will probably be no spaces in the URL. I used the symbol . because it is easy to distinguish, but maybe you can use \S instead of matching non-spaces.

Alternatively, you can use \A instead of ^ to match the beginning of the line (instead of the line break) and \Z instead of $ to match the end of the line (instead of line break)

0
source

Source: https://habr.com/ru/post/1384155/


All Articles