Python urlparse, right or wrong?

Question

Python urlparse, right or wrong?

The urlparse Python function parses the URL into six components (schema, netloc, path, and other things)

Now I found that parsing "example.com/path/file.ext" does not return netloc, but the path "example.com/path/file.ext".

Should it be netloc = "example.com" and path = "/path/file.ext"?

We really need a ": //" to determine if netloc exists or not?

Python ticket: http://bugs.python.org/issue8284

+4

python urlparse

Ben Apr 01 '10 at 21:58

source share

2 answers

example.com/path/file.ext not a URL. This is just a string. For example, if you put <a href="example.com/path/file.ext"> in an HTML page, it will not link to http://example.com/path/file.ext . This is just a shortcut provided by web browsers that you do not need to add http:// . You cannot even use such a URL as a parameter for urllib2.urlopen() and similar functions.

+1

Messa Apr 01 '10 at 22:05

source share

Vinay sajip · Accepted Answer · 2010-04-01T22:06:01+0000

Without the scheme: //, there is no guarantee that example.com is a domain. You may have an example.com directory. Likewise, you might have the url "omfgroflmao / path / file.ext", how would you know if "omfgroflmao" is a machine on the local network (ie netloc) or should it be a component of the path?

I don’t see that the Python code is actually erroneous, but perhaps the documentation should explicitly indicate the behavior in such ambiguous circumstances (I did not check).

Python urlparse, right or wrong?

More articles: