I need to cross (with approval) websites before I start writing my own, what is the best website cleaning tool / way that is fast (multi-threaded) and easy to learn?
Take a look at this latest Lee Holmes blog post . He wrote a pretty cool screen scraper using Powershell and the HTML Agility Pack .
Consider using TestPlan . It has a browser mode without a display for quick reading. The scripting language is very simple and quick to learn the basics.
TagSoup, a SAX-compatible parser written in Java, parses HTML because it is found in the wild: poor, nasty and cruel, although quite often far from short.
Details here: http://mercury.ccil.org/~cowan/XML/tagsoup/
You looked at it - https://scraperwiki.com/
Source: https://habr.com/ru/post/1303488/More articles:Comparing 2 columns in one table with How function - sqlHow to save formatting when copying from gedit to an open office? - formattingonChange never starts for in Internet Explorer - javascriptExpand UserControl property for XAML - propertiesGet MSBuild Publish Directory - msbuildhow to provide html / js widget for users - javascriptCurrency de-format in C # - c #Why HTML source does not change when dynamically updating DOM - javascriptjQuery onmouseover + onmouseout / hover on two different divs - jquerySQL Cascade Tree Visualization Tool - sql-serverAll Articles