I have a suspicion that this is due to a parser that the BS will use to read HTML. They document here , but if you look like me (on OSX), you might be stuck with something that requires a bit of work:
You will notice that on the BS4 documentation page above they indicate that, by default, BS4 will use the built-in Python HTML parser. Assuming you're on OSX, the Python version for Apple is 2.7.2, which is not suitable for character formatting. I ran into the same problem, so I updated my version of Python to get around it. Doing this in virtualenv will minimize disruption to other projects.
If this sounds like pain, you can switch to the LXML parser:
pip install lxml
And then try:
soup = BeautifulSoup(html, "lxml")
Depending on your scenario, this might be good enough. I found this annoying enough to warrant updating my version of Python. Using virtualenv, you can quite easily port your packages .
James Errico Nov 11 '14 at 3:16 2014-11-11 03:16
source share