PHP SimpleXML: how to load an HTML file?

When I try to download an HTML file in XML format with simplexml_load_string, I get a lot of errors and warnings regarding HTML and fail, is there a way to correctly load the html file using SimpleXML?

There may be unnecessary spaces in this HTML file and possibly some other errors that I would like to ignore SimpleXML.

+3
source share
4 answers

I would suggest using the PHP Simple HTML DOM . I myself used it for anything, starting from scripting a page and manipulating HTML template files, and very simple and powerful enough, and should meet your requirements.

, , :

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

// Find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>'; 
+3

DomDocument::loadHtmlFile simplexml_import_dom, HTML-, , SimpleXML.

+20

check this man page, one of these options (for example, LIBXML_NOERROR) may help you .. but keep in mind that html is not necessarily a valid xml, so parsing it as xml may not work.

0
source

Here is some quick code to load an external html page, then parse it with plain xml.

    //suppresses errors generated by poorly-formed xml
    libxml_use_internal_errors(true);

    //create the html object
    $html = new DOMDocument();

    //load the external html file
    $html->loadHtmlFile('http://blahwhatever.com/');

    //import the HTML object into simple xml
    $shtml = simplexml_import_dom($html);

    //print the result
    echo "<pre>";
    print_r($shtml);
    echo "</pre>";
0
source

Source: https://habr.com/ru/post/1753136/


All Articles