C # Regex retrieves div content

I saw some questions related to it, and I tried them, but they do not work. I want to combine content with div with thumbs id. But regex.Success returns false: (

Match regex = Regex.Match(html, @"<div[^>]*id=""thumbs"">(.+?)</div>"); 
+6
source share
3 answers

Regex is not a good choice for parsing HTML files.

HTML is not strict and is not regular with its format.

Use htmlagilitypack


Why use a parser?

Consider your regex. There are an infinite number of cases where you can break your code

  • Your regex won't work if there are nested divs
  • Some divs do not have an end tag ! (except XHTML)

You can use this code to get it using HtmlAgilityPack

 HtmlDocument doc = new HtmlDocument(); doc.Load(yourStream); var itemList = doc.DocumentNode.SelectNodes("//div[@id='thumbs']")//this xpath selects all div with thubs id .Select(p => p.InnerText) .ToList(); //itemList now contain all the div tags content having its id as thumbs 
+8
source

No, I don’t think he needs shoots. He has @ before the picture. I think it is right:

 <div[^>]*id="thumbs">(.+?)</div> 

So double double quotes

+1
source

Try the following:

 Regex r = new Regex(@"(?<text>(<div\s*?id=(\""|&quot;|&\#34;)" + @"thumb(\""|&quot;|&\#34;).*?>)(?>.*?</div>|.*?<div " + @"(?>depth)|.*?</div> (?>-depth))*)(?(depth)(?!)).*?</div>", RegexOptions.Singleline); 
0
source

Source: https://habr.com/ru/post/948710/


All Articles