What is the best method or tool to clean websites?

I need to cross (with approval) websites before I start writing my own, what is the best website cleaning tool / way that is fast (multi-threaded) and easy to learn?

+4
source share
4 answers

Take a look at this latest Lee Holmes blog post . He wrote a pretty cool screen scraper using Powershell and the HTML Agility Pack .

+1
source

Consider using TestPlan . It has a browser mode without a display for quick reading. The scripting language is very simple and quick to learn the basics.

0
source

TagSoup, a SAX-compatible parser written in Java, parses HTML because it is found in the wild: poor, nasty and cruel, although quite often far from short.

Details here: http://mercury.ccil.org/~cowan/XML/tagsoup/

0
source

You looked at it - https://scraperwiki.com/

0
source

Source: https://habr.com/ru/post/1303488/