I am working on a regex to validate URLs in C #. Right now, the regex should not match the other http:// , but the first inside the URL. This was my first attempt:
(https?:\/\/.+?)\/(.+?)(?!https?:\/\/)
But this regex doesn't work (even deleting (?!https?:\/\/) ). Take for example this input line:
http://test.test/notwork.http://test
Here is my first doubt: why the capture group (.+?) Does not match notwork.http://test ? The left quantifier should match as few times as possible, but why not until the end? In this case, of course, I missed something (firstly, I thought it could be related to the return, but I do not think it is), so I read this and found a solution, even if I'm not sure that this is the best as he says that
This method does not give an advantage over a lazy dot-star
In any case, this solution is a hardened point. This is my next attempt:
(https?:\/\/.+?)\/((?:(?!https?:\/\/).)*)
Now: this regex works, but not as we would like. I need a match only when the url is valid.
By the way, I think that I did not quite understand what the new regular expression is doing: why the negative forecast remains before . and not after him? So I tried to move it after . , and it seems to match the URL until it finds the second-last character before the second http. Returning to the corrected regular expression, my hypothesis is that a negative lookahead is actually trying to verify what is after . already read by regex, is this correct?
Other decisions are well made, but first I would like to understand this. Thanks.