Get HTML values โ€‹โ€‹from web response

I am trying to parse an HTML response into a pair of values โ€‹โ€‹and then paste them into SQL. I can get both values, but since the code is wrapped in a foreach statement, I get them twice.

Here is my HTML answer

<div align="CENTER" class='dataTitle'>Host State Breakdowns:</div> <p align='center'> <a href='trends.cgi?host=hostname&includesoftstates=no&assumeinitialstates=yes&initialassumedhoststate=0&backtrack=4'><img src='trends.cgi?createimage&host=hostname&includesoftstates=no&initialassumedhoststate=0&backtrack=4' border="1" alt='Host State Trends' title='Host State Trends' width='500' height='20'></a><br> </p> <div align="CENTER"> <table border="0" class='data'> <tr><th class='data'>State</th><th class='data'>Type / Reason</th><th class='data'>Time</th><th class='data'>% Total Time</th><th class='data'>% Known Time</th></tr> <tr class='dataEven'><td class='hostUP' rowspan="3">UP</td><td class='dataEven'>Unscheduled</td><td class='dataEven'>0d 10h 5m 19s</td><td class='dataEven'>100.000%</td><td class='dataEven'>100.000%</td></tr> <tr class='dataEven'><td class='dataEven'>Scheduled</td><td class='dataEven'>0d 0h 0m 0s</td><td class='dataEven'>0.000%</td><td class='dataEven'>0.000%</td></tr> <tr class='hostUNREACHABLE'><td class='hostUNREACHABLE'>Total</td><td class='hostUNREACHABLE'>0d 0h 0m 0s</td><td class='hostUNREACHABLE'>0.000%</td><td class='hostUNREACHABLE'>0.000%</td></tr> <tr class='dataOdd'><td class='dataOdd' rowspan="3">Undetermined</td><td class='dataOdd'>Nagios Not Running</td><td class='dataOdd'>0d 0h 0m 0s</td><td class='dataOdd'>0.000%</td><td class='dataOdd'></td></tr> <tr class='dataOdd'><td class='dataOdd'>Insufficient Data</td><td class='dataOdd'>0d 0h 0m 0s</td><td class='dataOdd'>0.000%</td><td class='dataOdd'></td></tr> <tr class='dataOdd'><td class='dataOdd'>Total</td><td class='dataOdd'>0d 0h 0m 0s</td><td class='dataOdd'>0.000%</td><td class='dataOdd'></td></tr> <tr><td colspan="3"></td></tr> <tr class='dataEven'><td class='dataEven'>All</td><td class='dataEven'>Total</td><td class='dataEven'>0d 10h 5m 19s</td><td class='dataEven'>100.000%</td><td class='dataEven'>100.000%</td></tr> </table> </div> <br><br> <div align="CENTER" class='dataTitle'>State Breakdowns For Host Services:</div> <div align="CENTER"> <table border="0" class='data'> <tr><th class='data'>Service</th><th class='data'>% Time OK</th><th class='data'>% Time Warning</th><th class='data'>% Time Unknown</th><th class='data'>% Time Critical</th><th class='data'>% Time Undetermined</th></tr> <tr class='dataOdd'><td class='dataOdd'><a href='avail.cgi?host=hostname&service=servicename&t1=1478498400&t2=1478534719&backtrack=4&assumestateretention=yes&assumeinitialstates=yes&assumestatesduringnotrunning=yes&initialassumedhoststate=0&initialassumedservicestate=0&show_log_entries&showscheduleddowntime=yes&rpttimeperiod=24x7'>servicename</a></td><td class='serviceOK'>100.000% (100.000%)</td><td class='serviceWARNING'>0.000% (0.000%)</td><td class='serviceUNKNOWN'>0.000% (0.000%)</td><td class='serviceCRITICAL'>0.000% (0.000%)</td><td class='dataOdd'>0.000%</td></tr> <tr class='dataEven'><td class='dataEven'><a href='avail.cgi?host=hostname&service=servicename2&t1=1478498400&t2=1478534719&backtrack=4&assumestateretention=yes&assumeinitialstates=yes&assumestatesduringnotrunning=yes&initialassumedhoststate=0&initialassumedservicestate=0&show_log_entries&showscheduleddowntime=yes&rpttimeperiod=24x7'>servicename2</a></td><td class='serviceOK'>100.000% (100.000%)</td><td class='serviceWARNING'>0.000% (0.000%)</td><td class='serviceUNKNOWN'>0.000% (0.000%)</td><td class='serviceCRITICAL'>0.000% (0.000%)</td><td class='dataEven'>0.000%</td></tr> </table> </div> 

Here is my code:

 var response = (HttpWebResponse)request.GetResponse(); var stream = response.GetResponseStream(); HtmlDocument doc = new HtmlDocument(); doc.Load(stream); foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//table[@class]")) { foreach (HtmlNode node2 in node.SelectNodes("//td[@class = 'serviceOK']")) { var value = node2.InnerText; } foreach (HtmlNode node3 in node.SelectNodes("//a[contains(@href, 'avail.cgi')]")) { var name = node3.InnerText; } } 

name shows the name servicename and value shows the class serviceOK, but it repeats again due to the first foreach.

My results are as follows:

 100.000% (100.000%) 100.000% (100.000%) servicename servicename2 100.000% (100.000%) 100.000% (100.000%) servicename servicename2 

Is there a way, first, to combine the values โ€‹โ€‹up and two, to show them only once?

+5
source share
1 answer

Your first foreach goes through the entire document, like your other foreach statements inside the first.
Since there are two table elements matching your XPath expression

 "//table[@class]" 

you get your answer twice. If you had more table elements matching your XPath expression, say, for example, 7, you will get the result 7 times.

You want to find all table divisions (td) with the class "serviceOK" that are inside the table row (tr) inside the table. When you have this HtmlNode, you can simply go to the previous sibling that will contain the service name.

 var response = (HttpWebResponse)request.GetResponse(); var stream = response.GetResponseStream(); HtmlDocument doc = new HtmlDocument(); doc.Load(stream); foreach (HtmlNode serviceOkNode in doc.DocumentNode.SelectNodes("//table[@class]/tr/td[@class = 'serviceOK']")) { HtmlNode serviceNameNode = serviceOkNode.PreviousSibling; var value = serviceOkNode.InnerText; var name = serviceNameNode.InnerText; } 
+3
source

Source: https://habr.com/ru/post/1259416/


All Articles