Python urlparse, right or wrong?

The urlparse Python function parses the URL into six components (schema, netloc, path, and other things)

Now I found that parsing "example.com/path/file.ext" does not return netloc, but the path "example.com/path/file.ext".

Should it be netloc = "example.com" and path = "/path/file.ext"?

We really need a ": //" to determine if netloc exists or not?

Python ticket: http://bugs.python.org/issue8284

+4
source share
2 answers

Without the scheme: //, there is no guarantee that example.com is a domain. You may have an example.com directory. Likewise, you might have the url "omfgroflmao / path / file.ext", how would you know if "omfgroflmao" is a machine on the local network (ie netloc) or should it be a component of the path?

I don’t see that the Python code is actually erroneous, but perhaps the documentation should explicitly indicate the behavior in such ambiguous circumstances (I did not check).

+6
source

example.com/path/file.ext not a URL. This is just a string. For example, if you put <a href="example.com/path/file.ext"> in an HTML page, it will not link to http://example.com/path/file.ext . This is just a shortcut provided by web browsers that you do not need to add http:// . You cannot even use such a URL as a parameter for urllib2.urlopen() and similar functions.

+1
source

Source: https://habr.com/ru/post/1305807/


All Articles