BeautifulSoup: Extract img alt data

I have the following html image and I'm trying to parse the information that is in alt. Currently, I can successfully retrieve images.

html (What I am parsing now

<img class="rslp-p" alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver" src="http://i.ebayimg.com/00/$(KGrHqZ,!j!E5dyh0jTpBO(3yE7Wg!~~_26.JPG?set_id=89040003C1" itemprop="image" /> 

I am creating an image name from what I am parsing:

Current code

 def main(url, output_folder="~/images"): """Download the images at url""" soup = bs(urlopen(url)) parsed = list(urlparse.urlparse(url)) count = 0 for image in soup.findAll("img"): print image count += 1 print count print "Image: %(src)s" % image image_url = urlparse.urljoin(url, image['src']) filename = image["src"].split("/")[-1].split("?")[0].replace("$",'').replace(".JPG",".jpg").replace("~~_26",str(count)).lstrip("(") parsed[2] = image["src"] outpath = os.path.join(output_folder, filename) urlretrieve(image_url, outpath) 

What I would like to do is extract

 alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver" 

I also want to use alt data as the file name when extracting the image.

+2
source share
1 answer

Inside the for loop, you can get this simply by doing

 image.get('alt', '') 

This is explained in the documentation of BeautifulSoup ("Tag Attributes").

+7
source

Source: https://habr.com/ru/post/970216/


All Articles