Parsing HTML with BeautifulSoup 4 and Python

I am trying to parse the resulting list http://mobile.de .

At first I tried it with the HTMLParser class, but got an error: HTMLParser.HTMLParseError: EOF in middle of construct .

So, I tried it with BeautifulSoup 4, which is better suited for invalid websites, but <div> Im Search for is not available, and I can’t tell if its error or websites.

 from bs4 import BeautifulSoup import urllib import socket searchurl = "http://suchen.mobile.de/auto/search.html?scopeId=C&isSearchRequest=true&sortOption.sortBy=price.consumerGrossEuro" f = urllib.urlopen(searchurl) html = f.read() soup = BeautifulSoup(html) for link in soup.find_all("div","listEntry "): print link 

listEntry is a <div> with the result of the cars. But it seems that it does not parse <form id="parkAndCompareVehicle" name="parkAndCompareVehicle" action=""> . I can not find the form in soupobject.

Where is the mistake?

+6
source share
1 answer

It should be something like:

 for link in soup.findAll('div', {'class': 'listEntry '}): print link 

Attributes are specified in the dictionary - findAll(self, name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs)

===========

update Sorry, it seems that in bs4 you can do this too.

As for the error, the form you are looking for is not in the results because it covers the list of Entries as far as I can see.

What is wrong with this:

 form = soup.find('form', id='parkAndCompareVehicle') print len(form.find_all('div', 'listEntry ')) 
+2
source

Source: https://habr.com/ru/post/912028/


All Articles