BeautifulSoup4 throws an error in Python 3.x

I am trying to create a web page scraper and I want to use BeautifulSoup for this. I installed BeautifulSoup 4.3.2 since the website said that it is compatible with python 3.x. I used

pip install beautifulsoup4

to install it. But when I started

from bs4 import BeautifulSoup
import requests

url = input("Enter a URL (start with www): ")

link = "http://" + url

data = requests.get(link).content

soup = BeautifulSoup(data)

for link in soup.find_all('a'):

   print(link.get('href'))

I get an error

Traceback (most recent call last):
File "/Users/user/Desktop/project.py", line 1, in <module>
  from bs4 import BeautifulSoup
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages   /bs4/__init__.py", line 30, in <module>
from .builder import builder_registry, ParserRejectedMarkup
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages/bs4/builder /__init__.py", line 308, in <module>
from .. import _htmlparser
  ImportError: cannot import name _htmlparser
+2
source share
3 answers

Just installed Python 3.x on my end and tested the latest BS4 download. Does not work. However, a fix can be found here: https://github.com/il-vladislav/BeautifulSoup4 (loans to GitHub Il Vladislav user, whoever you are).

zip, bs4 BeautifulSoup, python setup.py install. , , , .

:

from bs4 import BeautifulSoup
import requests

url = input("Enter a URL (start with www): ")
link = "http://" + url
data = requests.get(link).content
soup = BeautifulSoup(data)

for link in soup.find_all('a'):
   print(link.get('href'))

:

enter image description here

SO , , BS4 Python 3.x( 2 ).

+1

, , :

  File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages/bs4/builder /__init__.py", line 308, in <module>
  from .. import _htmlparser

308 bs4/builder /__init__.py

  from . import _htmlparser

, , bs4. , bs4 , 4.3.2, _htmlparser.py bs4/builder

+1

bs4/builder/_htmlparser.py,

A) HTMLParseError

from html.parser import HTMLParser

B) HTMLParseError

class HTMLParseError(Exception):
    """Exception raised for all parse errors."""

    def __init__(self, msg, position=(None, None)):
        assert msg
        self.msg = msg
        self.lineno = position[0]
        self.offset = position[1]

    def __str__(self):
        result = self.msg
        if self.lineno is not None:
            result = result + ", at line %d" % self.lineno
        if self.offset is not None:
            result = result + ", column %d" % (self.offset + 1)
        return result

This is probably not the best, as HTMLParserError is not going to go up. But! Your exception will simply be unmapped and in any case unhandled.

0
source

Source: https://habr.com/ru/post/1662228/


All Articles