PHP Xpath - HTML flat structure analysis

I am trying to parse fairly flat HTML and group everything from one h1 tag to another. For example, I have the following HTML:

<h1> Heading 1 </h1> <p> Paragraph 1.1 </p> <p> Paragraph 1.2 </p> <p> Paragraph 1.3 </p> <h1> Heading 2 </h1> <p> Paragraph 2.1 </p> <p> Paragraph 2.2 </p> <h1> Heading 3 </h1> <p> Paragraph 3.1 </p> <p> Paragraph 3.2 </p> <p> Paragraph 3.3 </p> 

Basically I want it to look like this:

 <div id='1'> <h1> Heading 1 </h1> <p> Paragraph 1.1 </p> <p> Paragraph 1.2 </p> <p> Paragraph 1.3 </p> </div> <div id='2'> <h1> Heading 2 </h1> <p> Paragraph 2.1 </p> <p> Paragraph 2.2 </p> </div> <div id='3'> <h1> Heading 3 </h1> <p> Paragraph 3.1 </p> <p> Paragraph 3.2 </p> <p> Paragraph 3.3 </p> </div> 

It’s probably not even worth publishing the code that I have done so far, as it just turned into a mess. Basically I tried to execute an Xpath request for '// h1'. Create new DIV tags as parent nodes. Then copy the h1 DOM Node to the first DIV, and then go to the next line until I remove another h1 tag - as mentioned, it became messy.

Can someone point me in a better direction here?

+4
source share
1 answer

Iterate over all nodes that are at the same level (I created a node hint called platau in my example), whenever you run <h1> , insert the div before and save the link to it.

For <h1> and any other node, and if the link exists, delete the node and add it as a child of the link.

Example:

 $doc->loadXML($xml); $xp = new DOMXPath($doc); $current = NULL; $id = 0; foreach($xp->query('/platau/node()') as $i => $sort) { if (isset($sort->tagName) && $sort->tagName === 'h1') { $current = $doc->createElement('div'); $current->setAttribute('id', ++$id); $current = $sort->parentNode->insertBefore($current, $sort); } if (!$current) continue; $sort->parentNode->removeChild($sort); $current->appendChild($sort); } 

Demo

+3
source

Source: https://habr.com/ru/post/1379915/


All Articles