I am trying to parse fairly flat HTML and group everything from one h1 tag to another. For example, I have the following HTML:
<h1> Heading 1 </h1> <p> Paragraph 1.1 </p> <p> Paragraph 1.2 </p> <p> Paragraph 1.3 </p> <h1> Heading 2 </h1> <p> Paragraph 2.1 </p> <p> Paragraph 2.2 </p> <h1> Heading 3 </h1> <p> Paragraph 3.1 </p> <p> Paragraph 3.2 </p> <p> Paragraph 3.3 </p>
Basically I want it to look like this:
<div id='1'> <h1> Heading 1 </h1> <p> Paragraph 1.1 </p> <p> Paragraph 1.2 </p> <p> Paragraph 1.3 </p> </div> <div id='2'> <h1> Heading 2 </h1> <p> Paragraph 2.1 </p> <p> Paragraph 2.2 </p> </div> <div id='3'> <h1> Heading 3 </h1> <p> Paragraph 3.1 </p> <p> Paragraph 3.2 </p> <p> Paragraph 3.3 </p> </div>
Itβs probably not even worth publishing the code that I have done so far, as it just turned into a mess. Basically I tried to execute an Xpath request for '// h1'. Create new DIV tags as parent nodes. Then copy the h1 DOM Node to the first DIV, and then go to the next line until I remove another h1 tag - as mentioned, it became messy.
Can someone point me in a better direction here?
source share