How to get html div element innertext by id using regex in c #

I get the full html code using WebClient. But I need to get the specified div from the full html using a regular expression.

eg:

<body>
<div id="main">
     <div id="left" style="float:left">this is a <b>left</b> side:<div style='color:red'> 1 </div>
     </div>
     <div id="right" style="float:left"> main side</div>
<div>
</body>

if I need a div called 'main', the return function

<div id="left" style="float:left">this is a <b>left</b> side:<div style='color:red'> 1 </div>
     </div>
     <div id="right" style="float:left"> main side</div>

If I need a div called "left", the return function

this is a <b>left</b> side:<div style='color:red'> 1 </div>

If I need a div called "right", the return function

 main side

How can i do this?

+3
source share
2 answers

Why do people insist on trying to use regex for html analysis? You can probably do this by eliminating a number of extreme cases ... but just use the HTML Agility Pack and you're done:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(...); // or Load
string main = doc.DocumentNode.SelectSingleNode("//div[@id='main']").InnerHtml;

( , , xhtml; xhtml, XmlDocument XDocument )

+4
string divname = "somename";
Match m = RegEx.Match(htmlContent, "<div[^>]*id="+divname+".*?>(.*?)</div");
string contenct = m.Groups[1].Tostring();

, div div

+2

Source: https://habr.com/ru/post/1717805/


All Articles