Alternative Python 3.2 Beautiful Soup

I need to make a web crawler to retrieve information from web pages. I did research and found that Beautiful Soup was excellent, as I could parse the entire document and create dom objects, iterate, extract attributes, etc. (Similitar to JQuery).

But I use Python 3.2, and there is no stable version for it (I think not at all, just 3.1 I saw on my home page).

So I need some good alternatives.

+4
source share
3 answers

On the lxml page:

The latest version works with all versions of CPython from 2.4 to 3.2.

0
source

It seems that the version of the wonderful soup 3.2.0 was released almost a year ago. There is also HTMLParser http://docs.python.org/library/htmlparser.html

+3
source

I think the latest version is 4.1.1, you can read about it here BS4 Documentation

I have been using BS4 with PHP on my website for this purpose for a while now, with excellent results. I had to switch to BSv3 due to a PHP / Python incompatibility problem, but it does not depend on how well the BS4 script works on its own.

At first I use the built-in HTML Parsing engine, but found this to be slow. After installing the LMXL engine on my web server, a huge increase in speed! There was no noticeable improvement in the actual parsing, but the speed increased sharply.

Go give him a reason - I recommend this, and I tried many different options before settling in Beautiful soup.

Good luck

+1
source

Source: https://habr.com/ru/post/1380095/


All Articles