This behavior of Beautiful Soup really annoys me. Here is my solution: http://soupy.readthedocs.org/en/latest/
This smooths out a lot of edges in BeautifulSoup, allowing you to write queries like
dom.find('h1').find('h2').find('a')['href'].orelse('not found').val()
Returns what you are looking for if it exists, or "not found" otherwise.
The general strategy in soupy is to wrap data that interests you in thin shell classes. A simple example of such a shell:
class Scalar(object): def __init__(self, val): self._val = val def __getattr__(self, key): return Scalar(getattr(self._val, key, None)) def __call__(self, *args, **kwargs): return Scalar(self._val(*args, **kwargs)) def __str__(self): return 'Scalar(%s)' % self._val s = Scalar('hi there') s.upper()
If you want to be interested in this, the mathematical property that allows you to safely hang things forever is closure (i.e., methods return instances of the same type). Many BeautifulSoup methods do not have this property, which is a dry address.
source share