Regex is not a good choice for parsing HTML files.
HTML is not strict and is not regular with its format.
Use htmlagilitypack
Why use a parser?
Consider your regex. There are an infinite number of cases where you can break your code
- Your regex won't work if there are nested divs
- Some divs do not have an end tag ! (except XHTML)
You can use this code to get it using HtmlAgilityPack
HtmlDocument doc = new HtmlDocument(); doc.Load(yourStream); var itemList = doc.DocumentNode.SelectNodes("//div[@id='thumbs']")
source share