Python, "urlparse.urlparse (url) .hostname" return None value

After entering the site I want to collect its links. I do this with this function (using the mechanize and urlparse libraries):

br = mechanize.Browser() . . #logging in on website . for link in br.links(): url = urlparse.urljoin(link.base_url, link.url) hostname = urlparse.urlparse(url).hostname path = urlparse.urlparse(url).path #print hostname #by printing this I found it to be the source of the None value mylinks.append("http://" + hostname + path) 

and I get this error message:

  mylinks.append("http://" + hostname + path) TypeError: cannot concatenate 'str' and 'NoneType' objects 

I am not sure how to fix this, or even if it can be fixed at all. Is there a way to force a function to be added, even if it creates a non-working and strange result for a value of None?

Alternatively, what I really find in the link is what the link ends with. for example, the html code for one of the links looks like this (what I am behind is the lexik world):

 <td class="center"> <a href="http://UnimportantPartOfLink/lexik>>lexik</a> </td> 

therefore, an alternative route would be that if mechanization can simply collect this value directly, bypassing links and problems with a lack of value

+6
source share
2 answers

Another good way without any attempts other than block is

Replace hostname = urlparse.urlparse(url).hostname with

 hostname = urlparse.urlparse(url).hostname or '' 

and similarly path = urlparse.urlparse(url).path with

 path = urlparse.urlparse(url).path or '' 

Hope this helps!

+5
source

Why not use a try/except block?

 try: mylinks.append("http://" + hostname + path) except TypeError: continue 

If there is an error, it will simply skip the add and continue the loop.

Hope this helps!

+4
source

Source: https://habr.com/ru/post/959130/


All Articles