A regular expression to match words (url) only if it does not contain a character

I use an API that sometimes truncates links inside the text that it returns, and instead of "longtexthere https://fancy.link " get get longtexthere https: // fa ... ".

I try to find the link only if it is complete or, in other words, does not contain the symbol "...".

So far I can get the links using the following regular expression:

((?:https?:)?\/\/\S+\/?) 

but obviously it returns every link, including broken ones.

I tried to do something like this:

 ((?:https?:)?\/\/(?:(?!…)\S)+\/?) 

Although it began to ignore the symbol "...", it still returned the link, but without including the symbol, so in the case of " https: // fa ..." it returned " https: // fa , whereas I just want so that he ignores this broken link and goes over.

Fought this watch and just could not lower his head. :(

Thanks for any help in advance.

+5
source share
4 answers

you can use

 (?:https?:)?\/\/[^\s…]++(?!…)\/? 

See the demo version of regex . The possessive quantifier [^\s…]++ will match all characters without spaces and not without subsequent backtracking, and then check if the next character is If so, no match will be found.

Alternatively, if your regex engine allows possessive quantifiers, use a negative lookup :

 (?!\S+…)(?:https?:)?\/\/\S+\/? 

Watch another demo version of regex . Appearance (?!\S+…) will not match if the characters <+> << → <1>

+2
source

Try:

  ((?:https?:)?\/\/\S+[^ \.]{3}\/?) 

This is the same as your original template .. you just say that the last three characters should not be. (period) or '' (space)

UPDATE: Your second link worked.

and if you tweak your regex slightly, it will do what you want:

  ((?:https?:)?\/\/\S+[^ …] \/?) 

Yes, it looks like what you had there, except that I added a “space” after the part that we don’t want. This will make the regular expression match until it contains a space, url, which has the character "...". Without space at the end, it will match until it includes "...", so it did not do what we wanted;)

+1
source

You can try the following regex

 https?:\/\/\w+(?:\.\w+\/?)+(?!\.{3})(\s|$) 

See demo https://regex101.com/r/bS6tT5/3

+1
source

Try:

 https?:\/\/[^ ]*?…|(https?:\/\/[^ ]+\.[^ ]+) 

Here is a demon.

0
source

Source: https://habr.com/ru/post/1246238/


All Articles