How to link HTML text without <p> tags using Nokogiri?

I need to parse an HTML document into different new files. The problem is that there are text nodes that were not wrapped with "<p>" tags, instead they have "<br>" tags at the end of each paragraph.

I want to wrap this text with <p> tags using Nokogiri:

 <div id="f15"><b>Footnote 15</b>: Catullus iii, 12.</div> <div class="pgmonospaced pgheader"><br/> <br/> End of the Project abc<br/> <br/> *** END OF THIS PROJECT XYZ ***<br/> <br/> ***** This file should be named new file.html... *****<br/> <br/></div> 
+4
source share
1 answer

After searching some forums and doing some debugging locally, I found the following solution to my problem.

 html_doc = Nokogiri::HTML.parse('path/to/html_file') html_doc html_doc.search("//br/preceding-sibling::text()|//br/following-sibling::text()").each do |node| node.replace(Nokogiri.make("<p>#{node.to_html}</p>")) end 
+5
source

Source: https://habr.com/ru/post/1391989/


All Articles