Bs4.FeatureNotFound: Could not find tree constructor with requested functions: lxml. Do you need to install the parser library?

Question

Bs4.FeatureNotFound: Could not find tree constructor with requested functions: lxml. Do you need to install the parser library?

... soup = BeautifulSoup(html, "lxml") File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__ % ",".join(features)) bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

The above outputs on my terminal. I am on Mac OS 10.7.x. I have Python 2.7.1 and follow this tutorial to get Beautiful Soup and lxml, which successfully installed and work with a separate test file located here . In the Python script that causes this error, I included this line: from pageCrawler import comparePages And in the pageCrawler file I included the following two lines: from bs4 import BeautifulSoup from urllib2 import urlopen

Any help in figuring out what the problem is and how to solve it will be greatly appreciated.

+157

python python-2.7 lxml beautifulsoup

user3773048 Jun 25 '14 at 0:12

source share

13 answers

James Errico · Answer 1 · 2014-11-11 03:16

I have a suspicion that this is due to a parser that the BS will use to read HTML. They document here , but if you look like me (on OSX), you might be stuck with something that requires a bit of work:

You will notice that on the BS4 documentation page above they indicate that, by default, BS4 will use the built-in Python HTML parser. Assuming you're on OSX, the Python version for Apple is 2.7.2, which is not suitable for character formatting. I ran into the same problem, so I updated my version of Python to get around it. Doing this in virtualenv will minimize disruption to other projects.

If this sounds like pain, you can switch to the LXML parser:

 pip install lxml

And then try:

 soup = BeautifulSoup(html, "lxml")

Depending on your scenario, this might be good enough. I found this annoying enough to warrant updating my version of Python. Using virtualenv, you can quite easily port your packages .

Tim Seed · Answer 2 · 2017-02-10 04:24

For base from python with bs4 installed, you can process your xml with

 soup = BeautifulSoup(html, "html5lib")

If you want to use formatter = 'xml' you need

 pip3 install lxml soup = BeautifulSoup(html, features="xml")

Ernst · Answer 3 · 2017-05-10 08:55

I preferred the python html built-in parser, without setting any dependencies soup = BeautifulSoup (s, "html.parser")

Bashar · Answer 4 · 2018-01-22 04:48

I am using Python 3.6 and I had the same original error in this post. After I run the command:

 python3 -m pip install lxml

it solved my problem

Yogesh · Answer 5 · 2018-02-13 12:28

Instead of using lxml use html.parser, you can use this piece of code:

 soup = BeautifulSoup(html, 'html.parser')

Projesh Bhoumik · Answer 6 · 2018-03-24 11:06

Although BeautifulSoup supports HTML parser by default. If you want to use any other third-party Python analyzers, you need to install this external analyzer, for example (lxml).

 soup_object= BeautifulSoup(markup,"html.parser") #Python HTML parser

But if you did not specify the parser as a parameter, you will receive a warning that the parser is not specified.

 soup_object= BeautifulSoup(markup) #Warnning

To use any other external parser, you need to install it and then specify it. like

 pip install lxml soup_object= BeautifulSoup(markup,'lxml') # C dependent parser

An external parser is dependent on c and python, which may have some advantages and disadvantages.

Qiao Yang · Answer 7 · 2017-03-04 06:17

I ran into the same problem. I found that the reason was because I had a slightly outdated python six package.

 >>> import html5lib Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/site-packages/html5lib/__init__.py", line 16, in <module> from .html5parser import HTMLParser, parse, parseFragment File "/usr/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 2, in <module> from six import with_metaclass, viewkeys, PY3 ImportError: cannot import name viewkeys

Upgrading your six packages will solve the problem:

 sudo pip install six=1.10.0

Serajush Salekin · Answer 8 · 2017-12-28 14:28

The Parser library is not installed on your computer or not found.

Try this command from cmd:

pip install lxml

duhaime · Answer 9 · 2018-03-04 18:30

I solved this error by updating my lxml distribution:

pip install -U lxml

blackholes · Answer 10 · 2018-11-17 01:36

 conda install lxml

worked for me from a virtual environment.
It was on Windows 10.

SAyantan GHosh · Answer 11 · 2018-11-21 09:44

python -m pip install lxml (in cmd)
import lxml (into your code / project)

Shubham Jadhav · Answer 12 · 2019-01-17 16:03

All the same error continues

abhishekPakrashi · Answer 13 · 2018-04-02 13:28

In some links, use the second instead of the first:

 soup_object= BeautifulSoup(markup,'html-parser') soup_object= BeautifulSoup(markup,'html.parser')

Bs4.FeatureNotFound: Could not find tree constructor with requested functions: lxml. Do you need to install the parser library?

More articles: