How to get innerhtml by class name or id using php

Hi, I am loading content from an external url. something like that.

$html=get_data($external_url);

where get_data () is a function to get content using curl.

now after that i want to get the internal html from different html elements like h1, div, p, span using their class or id.

for example, if the content from an external url ($ html) looks something like this.

<html>
<title></title>
<body>
    <h1 class="title">I am title</h1>
    <div id="content">
        i am the content.
    </div>
</body>

now i want to get the internal html of the html tag with class = "title". Similarly, I want to get the internal html tag with id = "content"

How to do this using php? I do not know about DOM, XML. please help.

0
source share
2 answers

DOMDocument::saveHTML(). php node, html. html node, node.

function getHtml($nodes) {
  $result = '';
  foreach ($nodes as $node) {
    $result .= $node->ownerDocument->saveHtml($node);
  }
  return $result;
}

, Xpath. .

:

//*

id "content"

//*[@id="content"]

node, - .

//*[@id="content"][1]

- node() ,

//*[@id="content"][1]/node()

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);

echo getHtml($xpath->evaluate('//*[@id="content"][1]/node()'));

. - , . . Xpath normalize-space() . , " one two three ". , " one " . Xpath:

:

normalize-space(@class)

:

concat(" ", normalize-space(@class), " ")

,

contains(concat(" ", normalize-space(@class), " "), " title ")

//*[contains(concat(" ", normalize-space(@class), " "), " title ")][1]/node()

:

$html = <<<'HTML'
<html>
<title></title>
<body>
    <h1 class="title">I am title</h1>
    <div id="content">
        i am the <b>content</b>.
    </div>
</body>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);

function getHtml($nodes) {
  $result = '';
  foreach ($nodes as $node) {
    $result .= $node->ownerDocument->saveHtml($node);
  }
  return $result;
}

// first node with the id
var_dump(
  getHtml(
    $xpath->evaluate('//*[@id="content"][1]/node()')
  )
);

// first node with the class
var_dump(
  getHtml(
    $xpath->evaluate(
      '//*[contains(concat(" ", normalize-space(@class), " "), " title ")][1]/node()'
    )
  )
);

// alternative - handling multiple nodes with the same class in a loop
$nodes = $xpath->evaluate(
  '//*[contains(concat(" ", normalize-space(@class), " "), " title ")]'
);
foreach ($nodes as $node) {
  var_dump(getHtml($xpath->evaluate('node()', $node)));
}

: https://eval.in/118248

string(40) "
        i am the <b>content</b>.
    "
string(10) "I am title"
string(10) "I am title"
+1

.

$dom_doc = new DomDocument();
$dom_doc->loadHTML($returned_external_html);
$element = $dom_doc->getElementsByTagName('table'); // you can search for any tags like <img>, <p> and etc. This will return a DOMNodeList
$element = $dom_doc->getElementById('specific_id'); // If you know the id of element you are seeking for try this. This will return a DOMElement
//If I want to getINNERHTML for the table element, the code should be:
$innerHTML= ''; 
$children = $element->childNodes; 
foreach ($children as $child) { 
    $innerHTML .= $child->ownerDocument->saveXML( $child ); 
}
echo $innerHTML; //contain the inner html of the element


DOMDocument GetElementsByTagName
DOMDocument GetElementById

+1
source

Source: https://habr.com/ru/post/1762585/


All Articles