XPath - select everything after a specific tag

I am trying to put in an HTML string after the h1 tag to the next h1 tag, and then continue.

For example, here is the HTML:

 <h1>Heading</h1> <p>Paragraph</p> <ul> <li>List item</li> <li>List item</li> </ul> <p>Paragraph</p> <h1>Heading 2</h1> <ul> <li>List item</li> <li>List item</li> </ul> <p>Paragraph<img /></p> 

And from this I am trying to create this array:

 array( 0 => '<p>Paragraph</p><ul><li>List item</li><li>List item</li></ul><p>Paragraph</p>', 1 => '<ul><li>List item</li><li>List item</li></ul><p>Paragraph<img /></p>' ) 

What will be the XPath query to select all the content after the h1 tag to the next, etc.?

Any help or advice is appreciated.

UPDATE:

I end up trying to achieve using PHP to create this array format:

 array( 'headings' => array( 1 => '<h1>Heading</h1>', 2 => '<h1>Heading 2</h1>' ), 'content' => array( 1 => '<p>Paragraph</p><ul><li>List item</li><li>List item</li></ul><p>Paragraph</p>', 2 => '<ul><li>List item</li><li>List item</li></ul><p>Paragraph<img /></p>' ) ) 
+4
source share
2 answers

I did it like this :)

 $html = '<h1>Heading</h1><p>Paragraph</p><ul><li>List item</li><li>List item</li></ul><p>Paragraph</p><h1>Heading 2</h1><ul><li>List item</li><li>List item</li></ul><p>Paragraph<img /></p>'; $dom_document = new DOMDocument(); $dom_document->loadHTML($html); $dom_document->preserveWhiteSpace = false; //use DOMXpath to navigate the html with the DOM $dom_xpath = new DOMXpath($dom_document); $elements = $dom_xpath->query("/html/body/*"); if (!is_null($elements)) { $i = 0; foreach ($elements as $element) { if ($element->nodeName == 'h1') { $i++; $array['headings'][$i] = $dom_document->saveHtml($element); continue; } else { $array['content'][$i] .= $dom_document->saveHtml($element); } } } var_dump($array); 

NOTE: whether to use PHP 5.2 and then replace:

 $array['headings'][$i] = $dom_document->saveHtml($element); 

and

 $array['content'][$i] .= $dom_document->saveHtml($element); 

with:

 $array['headings'][$i] = $dom_document->saveXml($element); $array['content'][$i] .= $dom_document->saveXml($element); 
0
source

Here is a quick way to do it.

Assuming your code is placed in $code :

 $code = <<<'CODE' <h1>Heading</h1> <p>Paragraph</p> <ul> <li>List item</li> <li>List item</li> </ul> <p>Paragraph</p> <h1>Heading 2</h1> <ul> <li>List item</li> <li>List item</li> </ul> <p>Paragraph<img /></p> CODE; 

Decision:

 // Content array... $content = array_map( function ($element) { return preg_replace('/\>\s+\</', '><', $element); }, preg_split('/\<h1\>[^\<]*\<\/h1\>/', $code) ); array_shift($content); // Headings array... preg_match_all('/\<h1\>[^\<]*\<\/h1\>/', $code, $matches); $headings = $matches[0]; // Result $result = array( 'headings' => $headings, 'content' => $content, ); print_r($result); 

Output:

 Array ( [headings] => Array ( [0] => <h1>Heading</h1> [1] => <h1>Heading 2</h1> ) [content] => Array ( [0] => <p>Paragraph</p><ul><li>List item</li><li>List item</li></ul><p>Paragraph</p> [1] => <ul><li>List item</li><li>List item</li></ul><p>Paragraph<img /></p> ) ) 
+1
source

Source: https://habr.com/ru/post/1440555/


All Articles