Regex: capture first appearance before viewing

I am trying to capture URLs before a specific word. The only problem is that this word can also be part of the domain.

Examples: (I'm trying to capture everything before dinner)

https://breakfast.example.com/lunch/dinner/

https://breakfast.example.brunch.com:8080/lunch/dinner

http://dinnerdemo.example.com/dinner/

I can use:

^ (. *: //.*/) (? = dinner /?)

The problem I am facing is that the look does not seem lazy enough So the following happens:

https://breakfast.example.com/lunch/dinner/login.html?returnURL=https://breakfast.example.com/lunch/dinner/

as it is captured:

https://breakfast.example.com/lunch/dinner/login.html?returnURL=https://breakfast.example.com/lunch/

, . , , ?

+4
4

:

^(.*?:\/\/).*?/(?=dinner/?)

.* , , .

.* , , . . .*? , , .

+4

, lookahead - .

, , , .

^https?:\/\/(?:[^\/]+\/)*?(?=dinner(?:\/|$))

: (?:/|$) , , "" .

+4

- .* .*?.

, perl, . , , :

use strict;
use warnings;

while (<DATA>) {
    if (m{^(.*?://.*?/.*?)(?=\bdinner\b)}) {
        print $1, "\n";
    }
}

__DATA__
https://breakfast.example.com/lunch/dinner/
https://breakfast.example.brunch.com:8080/lunch/dinner
http://dinnerdemo.example.com/dinner/

:

https://breakfast.example.com/lunch/
https://breakfast.example.brunch.com:8080/lunch/
http://dinnerdemo.example.com/
+1

.

 # Multi-line optional
 # ^(?:(?!://).)*://[^?/\r\n]+/(?:(?!dinner)[^?/\r\n]+/)*(?=dinner)


 ^                    # BOL
 (?:
      (?! :// )
      . 
 )*
 ://
 [^?/\r\n]+           # Domain
 /     
 (?:
      (?! dinner )    # Dirs ?
      [^?/\r\n]+ 
      /          
 )*
 (?= dinner )

https://breakfast.example.com/lunch/ /

https://breakfast.example.brunch.com:8080/lunch/

http://dinnerdemo.example.com/ /

https://breakfast.example.com/lunch/ /login.html? returnURL = https://breakfast.example.com/lunch/dinner/

+1

Source: https://habr.com/ru/post/1546060/


All Articles