Label1
Value1
Label2
Value2

How to parse consecutive tags using Nokogiri?

I have an HTML code:

<div id="first"> <dt>Label1</dt> <dd>Value1</dd> <dt>Label2</dt> <dd>Value2</dd> ... </div> 

My code is not working.

 doc.css("first").each do |item| label = item.css("dt") value = item.css("dd") end 

Show all the <dt> tags and then the <dd> tags, and I need "label: value"

+4
source share
3 answers

First of all, your HTML should have the <dt> and <dd> elements inside the <dl> :

 <div id="first"> <dl> <dt>Label1</dt> <dd>Value1</dd> <dt>Label2</dt> <dd>Value2</dd> ... </dl> </div> 

but that will not change your analysis. You want to find <dt> and iterate over them, then in each <dt> you can use next_element to get <dd> ; something like that:

 doc = Nokogiri::HTML('<div id="first"><dl>...') doc.css('#first').search('dt').each do |node| puts "#{node.text}: #{node.next_element.text}" end 

This should work as long as the structure matches your example.

+5
source

Assuming some <dt> may have multiple <dd> , you want to find all <dt> , and then (for each) find the next <dd> to the next <dt> . This is pretty easy to do in pure Ruby, but more fun to do only in XPath .;)

Given this setting:

 require 'nokogiri' html = '<dl id="first"> <dt>Label1</dt><dd>Value1</dd> <dt>Label2</dt><dd>Value2</dd> <dt>Label3</dt><dd>Value3a</dd><dd>Value3b</dd> <dt>Label4</dt><dd>Value4</dd> </dl>' doc = Nokogiri.HTML(html) 

Using XPath :

 doc.css('dt').each do |dt| dds = [] n = dt.next_element begin dds << n n = n.next_element end while n && n.name=='dd' p [dt.text,dds.map(&:text)] end #=> ["Label1", ["Value1"]] #=> ["Label2", ["Value2"]] #=> ["Label3", ["Value3a", "Value3b"]] #=> ["Label4", ["Value4"]] 

Using little XPath :

 doc.css('dt').each do |dt| dds = dt.xpath('following-sibling::*').chunk{ |n| n.name }.first.last p [dt.text,dds.map(&:text)] end #=> ["Label1", ["Value1"]] #=> ["Label2", ["Value2"]] #=> ["Label3", ["Value3a", "Value3b"]] #=> ["Label4", ["Value4"]] 

Using Lotsa XPath :

 doc.css('dt').each do |dt| ct = dt.xpath('count(following-sibling::dt)') dds = dt.xpath("following-sibling::dd[count(following-sibling::dt)=#{ct}]") p [dt.text,dds.map(&:text)] end #=> ["Label1", ["Value1"]] #=> ["Label2", ["Value2"]] #=> ["Label3", ["Value3a", "Value3b"]] #=> ["Label4", ["Value4"]] 
+4
source

Looking at another answer is an inefficient way to do the same.

 require 'nokogiri' a = Nokogiri::HTML('<div id="first"><dt>Label1</dt><dd>Value1</dd><dt>Label2</dt><dd>Value2</dd></div>') dt = [] dd = [] a.css("#first").each do |item| item.css("dt").each {|t| dt << t.text} item.css("dd").each {|t| dd << t.text} end dt.each_index do |i| puts dt[i] + ': ' + dd[i] end 

In css to link to the ID you need to put the # symbol before. For the class, this. symbol.

0
source

Source: https://habr.com/ru/post/1386004/


All Articles