Webpage cleaning (html) using C #

This is just a general question. I am currently doing a webpage cleanup using regex. But I think it is sometimes difficult to define a regex, so I think XSL / XPath is an alternative to regex in C #?

In addition, I would like to know if there are better methods for cleaning a web page, except for the two listed above. Thank you

+4
source share
1 answer

You can take a look at the SgmlReader or the Html Agility Pack , which are HTML parsing libraries for .NET.

+7
source

Source: https://habr.com/ru/post/1340062/


All Articles