Writing a forum cleaning program

I need to write a program to clean up forums.

Should I write a Python program using the Scrapy framework or use Php cURL? Is there also a Php equivalent of Scrapy?

thank

+3
source share
2 answers

I would choose Python because of the excellent libxml2 bindings, in particular such as lxml.html and pyQuery . Scrapy has its own libxml2 bindings, I did not look at them to check them out, although looking at the Scrapy documentation did not leave me very impressed (I made a lot of clips just using these parsers and manual coding). With any of these, you get a truly excellent HTML parser by querying through XPath, and with lxml.html and pyquery (also built on lxml) you get a CSS selector.

If you do a little work, scraping the forum, I would skip the framework and just do it manually - it's just parallelization, etc. not really required.

+4
source

PHP , . .

, . . . Python.

, . Harvestman, Scrapy .. , 80legs, .

. , , , , PHP. . , , http://wiki.python.org/moin/PythonVsPhp

+3

Source: https://habr.com/ru/post/1748662/


All Articles