HTML analysis with Nokogiri in Ruby
2 answers
You can use Ruby to remove a large set of results for certain elements:
page.css('div.one')[1,2] # Two items starting at index 1 (2nd item) page.css('div.one')[1..2] # Items with indices between 1 and 2, inclusive
Since Ruby indexing starts from scratch, you have to take care of which elements you want.
Alternatively, you can use the CSS selector to find the nth element :
# Second and third items from the set, jQuery-style page.css('div.one:eq(2),div.one:eq(3)') # Second and third children, CSS3-style page.css('div.one:nth-child(2),div.one:nth-child(3)')
Or you can use XPath to return specific matches:
# Second and third children page.xpath("//div[@class='one'][position()=2 or position()=3]") # Second and third items in the result set page.xpath("(//div[@class='one'])[position()=2 or position()=3]")
In both CSS and XPath, note that:
- Numbering starts from 1, not 0
You can use
at_css
andat_xpath
instead to return the first similar matching element instead of NodeSet.# A NodeSet with a single element in it: page.css('div.one:eq(2)') # The second div element page.at_css('div.one:eq(2)')
Finally, note that if you select one item by index using XPath, you can use a shorter format:
# First div.one seen that is the second child of its parent page.at_xpath('//div[@class="one"][2]') # Second div.one in the entire document page.at_xpath('(//div[@class="one"])[2]')
+7