C # Regex retrieves div content

Question

C # Regex retrieves div content

I saw some questions related to it, and I tried them, but they do not work. I want to combine content with div with thumbs id. But regex.Success returns false: (

Match regex = Regex.Match(html, @"<div[^>]*id=""thumbs"">(.+?)</div>");

+6

c # regex

Bart wesselink Jul 04 '13 at 12:37

source share

3 answers

No, I don’t think he needs shoots. He has @ before the picture. I think it is right:

 <div[^>]*id="thumbs">(.+?)</div>

So double double quotes

+1

Velja Radenkovic Jul 04 '13 at 12:46

source share

Try the following:

 Regex r = new Regex(@"(?<text>(<div\s*?id=(\""|&quot;|&\#34;)" + @"thumb(\""|&quot;|&\#34;).*?>)(?>.*?</div>|.*?<div " + @"(?>depth)|.*?</div> (?>-depth))*)(?(depth)(?!)).*?</div>", RegexOptions.Singleline);

0

Zaheer ahmed Jul 04 '13 at 12:46

source share

Anirudha · Accepted Answer · 2013-07-04T12:45:27+0000

Regex is not a good choice for parsing HTML files.

HTML is not strict and is not regular with its format.

Use htmlagilitypack

Why use a parser?

Consider your regex. There are an infinite number of cases where you can break your code

Your regex won't work if there are nested divs
Some divs do not have an end tag ! (except XHTML)

You can use this code to get it using HtmlAgilityPack

 HtmlDocument doc = new HtmlDocument(); doc.Load(yourStream); var itemList = doc.DocumentNode.SelectNodes("//div[@id='thumbs']")//this xpath selects all div with thubs id .Select(p => p.InnerText) .ToList(); //itemList now contain all the div tags content having its id as thumbs

C # Regex retrieves div content

More articles: