Bs4.FeatureNotFound: Could not find tree constructor with requested functions: lxml. Do you need to install the parser library?

... soup = BeautifulSoup(html, "lxml") File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__ % ",".join(features)) bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library? 

The above outputs on my terminal. I am on Mac OS 10.7.x. I have Python 2.7.1 and follow this tutorial to get Beautiful Soup and lxml, which successfully installed and work with a separate test file located here . In the Python script that causes this error, I included this line: from pageCrawler import comparePages And in the pageCrawler file I included the following two lines: from bs4 import BeautifulSoup from urllib2 import urlopen

Any help in figuring out what the problem is and how to solve it will be greatly appreciated.

+157
python lxml beautifulsoup
Jun 25 '14 at 0:12
source share
13 answers

I have a suspicion that this is due to a parser that the BS will use to read HTML. They document here , but if you look like me (on OSX), you might be stuck with something that requires a bit of work:

You will notice that on the BS4 documentation page above they indicate that, by default, BS4 will use the built-in Python HTML parser. Assuming you're on OSX, the Python version for Apple is 2.7.2, which is not suitable for character formatting. I ran into the same problem, so I updated my version of Python to get around it. Doing this in virtualenv will minimize disruption to other projects.

If this sounds like pain, you can switch to the LXML parser:

 pip install lxml 

And then try:

 soup = BeautifulSoup(html, "lxml") 

Depending on your scenario, this might be good enough. I found this annoying enough to warrant updating my version of Python. Using virtualenv, you can quite easily port your packages .

+170
Nov 11 '14 at 3:16
source share

For base from python with bs4 installed, you can process your xml with

 soup = BeautifulSoup(html, "html5lib") 

If you want to use formatter = 'xml' you need

 pip3 install lxml soup = BeautifulSoup(html, features="xml") 
+42
Feb 10 '17 at 4:24
source share

I preferred the python html built-in parser, without setting any dependencies soup = BeautifulSoup (s, "html.parser")

+16
May 10 '17 at 8:55
source share

I am using Python 3.6 and I had the same original error in this post. After I run the command:

 python3 -m pip install lxml 

it solved my problem

+10
Jan 22 '18 at 4:48
source share

Instead of using lxml use html.parser, you can use this piece of code:

 soup = BeautifulSoup(html, 'html.parser') 
+5
Feb 13 '18 at 12:28
source share

Although BeautifulSoup supports HTML parser by default. If you want to use any other third-party Python analyzers, you need to install this external analyzer, for example (lxml).

 soup_object= BeautifulSoup(markup,"html.parser") #Python HTML parser 

But if you did not specify the parser as a parameter, you will receive a warning that the parser is not specified.

 soup_object= BeautifulSoup(markup) #Warnning 

To use any other external parser, you need to install it and then specify it. like

 pip install lxml soup_object= BeautifulSoup(markup,'lxml') # C dependent parser 

An external parser is dependent on c and python, which may have some advantages and disadvantages.

+4
Mar 24 '18 at 11:06
source share

I ran into the same problem. I found that the reason was because I had a slightly outdated python six package.

 >>> import html5lib Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/site-packages/html5lib/__init__.py", line 16, in <module> from .html5parser import HTMLParser, parse, parseFragment File "/usr/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 2, in <module> from six import with_metaclass, viewkeys, PY3 ImportError: cannot import name viewkeys 

Upgrading your six packages will solve the problem:

 sudo pip install six=1.10.0 
+3
Mar 04 '17 at 6:17
source share

The Parser library is not installed on your computer or not found.

Try this command from cmd:

pip install lxml

+3
Dec 28 '17 at 14:28
source share

I solved this error by updating my lxml distribution:

pip install -U lxml

+2
Mar 04 '18 at 18:30
source share
 conda install lxml 

worked for me from a virtual environment.
It was on Windows 10.

0
Nov 17 '18 at 1:36
source share
  1. python -m pip install lxml (in cmd)

  2. import lxml (into your code / project)

0
Nov 21 '18 at 9:44
source share

All the same error continues

0
Jan 17 '19 at 16:03
source share

In some links, use the second instead of the first:

 soup_object= BeautifulSoup(markup,'html-parser') soup_object= BeautifulSoup(markup,'html.parser') 
-one
Apr 02 '18 at 13:28
source share



All Articles