BeautifulSoup: Extract img alt data

Question

BeautifulSoup: Extract img alt data

I have the following html image and I'm trying to parse the information that is in alt. Currently, I can successfully retrieve images.

html (What I am parsing now

<img class="rslp-p" alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver" src="http://i.ebayimg.com/00/$(KGrHqZ,!j!E5dyh0jTpBO(3yE7Wg!~~_26.JPG?set_id=89040003C1" itemprop="image" />

I am creating an image name from what I am parsing:

Current code

 def main(url, output_folder="~/images"): """Download the images at url""" soup = bs(urlopen(url)) parsed = list(urlparse.urlparse(url)) count = 0 for image in soup.findAll("img"): print image count += 1 print count print "Image: %(src)s" % image image_url = urlparse.urljoin(url, image['src']) filename = image["src"].split("/")[-1].split("?")[0].replace("$",'').replace(".JPG",".jpg").replace("~~_26",str(count)).lstrip("(") parsed[2] = image["src"] outpath = os.path.join(output_folder, filename) urlretrieve(image_url, outpath)

What I would like to do is extract

 alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver"

I also want to use alt data as the file name when extracting the image.

+2

python html beautifulsoup scrape

Null-hypothesis Jul 27 '12 at 23:07

source share

1 answer

Gonzalo delgado · Answer 1 · 2012-07-27T23:23:56+0000

Inside the for loop, you can get this simply by doing

 image.get('alt', '')

This is explained in the documentation of BeautifulSoup ("Tag Attributes").

BeautifulSoup: Extract img alt data

More articles: