How to check the correct url using `urlparse`?

Question

How to check the correct url using `urlparse`?

I want to check if the URL is valid before opening it to read data.

I used the urlparse function from the urlparse package:

 if not bool(urlparse.urlparse(url).netloc): # do something like: open and read using urllin2

However, I noticed that some valid URLs are considered broken, for example:

 url = upload.wikimedia.org/math/8/8/d/88d27d47cea8c88adf93b1881eda318d.png

This url is valid (I can open it using my browser).

Is there a better way to check if the URL is valid?

+6

python urllib2 url-parsing urlparse

Ziva Aug 12 '14 at 8:03

source share

3 answers

xbello · Answer 1 · 2014-08-12T08:24:14+0000

You can check if the URL has a scheme:

 >>> url = "no.scheme.com/math/12345.png" >>> parsed_url = urlparse.urlparse(url) >>> bool(parsed_url.scheme) False

If so, you can replace the scheme and get the real valid URL:

 >>> parsed_url.geturl() "no.scheme.com/math/12345.png" >>> parsed_url = parsed_url._replace(**{"scheme": "http"}) >>> parsed_url.geturl() 'http:///no.scheme.com/math/12345.png'

abdullahselek · Answer 2 · 2017-12-07T11:55:58+0000

You can try the function below which the scheme , netloc and path variables that appear after parsing the URL are checked. Supports both Python 2 and 3.

 try: # python 3 from urllib.parse import urlparse except ImportError: from urlparse import urlparse def url_validator(url): try: result = urlparse(url) return all([result.scheme, result.netloc, result.path]) except: return False

vil · Answer 3 · 2014-08-12T08:13:10+0000

Url without a scheme is actually invalid, your browser is smart enough to offer http: // as a scheme for it. This might be a good solution to check if the URL of the scheme ( not re.match(r'^[a-zA-Z]+://', url) ) and prepend http:// does not have.

How to check the correct url using `urlparse`?

More articles: