Python - getting all links from div with class

I use BeautifulSoup to get all the links to mobile phones from this URL http://www.gsmarena.com/samsung-phones-f-9-0-p2.php

My code for the following:

import urllib2 from BeautifulSoup import BeautifulSoup url = "http://www.gsmarena.com/samsung-phones-f-9-0-p2.php" text = urllib2.urlopen(url).read(); soup = BeautifulSoup(text); data = soup.findAll('div',attrs={'class':'makers'}); for i in data: print "http://www.gsmarena.com/" + i.ul.li.a['href']; 

But the returned list of URLs is shorter than the expected output, when I checked, this code outputs 3 values, but the result should show much more than 10 values

+6
source share
4 answers

On this page there are only three <div> elements with a class of "creators", this will print the first link from each div, so there are three in general.

This is most likely closer to what you want:

 import urllib2 from BeautifulSoup import BeautifulSoup url = "http://www.gsmarena.com/samsung-phones-f-9-0-p2.php" text = urllib2.urlopen(url).read() soup = BeautifulSoup(text) data = soup.findAll('div',attrs={'class':'makers'}) for div in data: links = div.findAll('a') for a in links: print "http://www.gsmarena.com/" + a['href'] 
+20
source

Because you display only one link to a div, while on this site you can see that there are several links, each of which is inside its own li and several lis per ul. You will need to skip all lis.

0
source

If you have Python 3, you can use Simon's answer with the following change:

 from urllib.request import urlopen from bs4 import BeautifulSoup text = urlopen(base_url).read() 
0
source

Taken from http://www.crummy.com/software/BeautifulSoup/download/2.x/documentation.html :

For example, if you want to get only β€œa” tags with non-empty attributes β€œhref”, you would call soup.fetch('a', {'href':re.compile('.+')}) . If you want to get all tags that had a width attribute of 100, you would call soup.fetch(attrs={'width':100}) .

Try the following: data = soup.findAll('div',attrs={'class':re.compile('.+')});

Must extract all divs with the class provided, not empty.

-1
source

Source: https://habr.com/ru/post/904389/


All Articles