I am trying to make a script that deletes a website to get the latest news updates. Unfortunately, I am having a small problem that I seem to be unable to fix due to my limited knowledge of the DOM.
The page I'm trying to clear is constructed as follows:
<table> <tr class="color1"> <td>Author</td> <td>Content <a href="#">in HTML</a></td> <td>Date</td> </tr> </table>
I can get the fields I need, except for the content. With $ td-> nodeValue I retrieve the content in text form, whereas I want it in HTML (there are "tags", "blockquote", etc.)
Here is the code I have:
try { $html = @ file_get_contents("test.php"); checkIfFileExists($html); $dom = new DOMDocument(); @ $dom->loadHTML($html); $trNodes = $dom->getElementsByTagName("tr"); foreach ($trNodes as $tr) { if ($tr->getAttribute("class") == "color1" || $tr->getAttribute("class") == "color2") { $tdNodes = $tr->childNodes; foreach ($tdNodes as $td) { echo $td->nodeValue . "<br />\n"; } echo "<br /><br /><br /><br /><br />\n"; } } catch(Exception $e) { echo $e->getMessage(); }
I would prefer not to resort to any third-party library, but, obviously, any answer is most valuable, library or not.
Thanks in advance.
source share