: has a pseudo CSS class in Nokigiri

Question

: has a pseudo CSS class in Nokigiri

I am looking for the :has pseudo- class in Nokogiri . It should work just like a jQuery has selector .

For instance:

 <li><h1><a href="dfd">ex1</a></h1><span class="string">sdfsdf</span></li> <li><h1><a href="dsfsdf">ex2</a></h1><span class="string"></span></li> <li><h1><a href="sdfd">ex3</a></h1></li>

The CSS selector should return only the first link, one with a non-empty span.string sibling.

In jQuery, this selector works well:

 $('li:has(span.string:not(:empty))>h1>a')

but not in Nokigiri:

 Nokogiri::HTML(html_source).css('li:has(span.string:not(:empty))>h1>a')

:not and :empty works well, but not :has .

Is there any documentation for CSS selectors in Nokigiri?
Maybe someone can write a custom pseudo-class :has ? Here is an example of how to write a selector :regexp .
If desired, I can use XPath. How to write XPath for li:has(span.string:not(:empty))>h1>a ?

+1

jquery css ruby ruby-on-rails nokogiri

rogal111 Aug 1 '12 at 13:19

source share

4 answers

Nokogiri doesn't have a selector :has , here is the documentation on what it does: http://ruby.bastardsbook.com/chapters/html-parsing/#h-2-2

+1

Austin Aug 1 '12 at 13:39

source share

Ok, I found a solution that might be useful to someone.

Custom pseudo- :custom_has : :custom_has :

 class MyCustomSelectors def custom_has node_set, selector node_set.find_all { |node| node.css(selector).present? } end end #usage: doc.css('li:custom_has(span.string:not(:empty))>h1>a',MyCustomSelectors.new)

Why did I declare :custom_has not easy :has ? Because he already announced. The Nokogiri repository has tests for the :has selector, but they do not work. I reported this problem to the author.

+1

rogal111 Aug 1 '12 at 13:59

source share

Nokogiri allows you to associate .css() and .xpath() calls with the same object. So anytime you like to use :has , just end the current .css() call and add .xpath(..) (the parent selector). You can even renew your choice with another .css() call, starting with your xpath() stopped!

Example:

Here are some HTML from Wikipedia:

 <tr> <th scope="row" style="text-align:left;"> Origin </th> <td> <a href="/wiki/Edinburgh" title="Edinburgh">Edinburgh</a> <a href="/wiki/Scotland" title="Scotland">Scotland</a> </td> </tr> <tr> <th scope="row" style="text-align:left;"> <a href="/wiki/Music_genre" title="Music genre">Genres</a> </th> <td> <a href="/wiki/Electronica" title="Electronica">Electronica</a> <a href="/wiki/Intelligent_dance_music" title="Intelligent dance music">IDM</a> <a href="/wiki/Ambient_music" title="Ambient music">ambient</a> <a href="/wiki/Downtempo" title="Downtempo">downtempo</a> <a href="/wiki/Trip_hop" title="Trip hop">trip hop</a> </td> </tr> <tr> <th scope="row" style="text-align:left;"> <a href="/wiki/Record_label" title="Record label">Labels</a> </th> <td> <a href="/wiki/Warp_(record_label)" title="Warp (record label)">Warp</a> <a href="/wiki/Skam_Records" title="Skam Records">Skam</a> <a href="/wiki/Music70" title="Music70">Music70</a> </td> </tr>

Suppose you want to select all the <a> elements inside the first <td> that appears after the <th> containing the link with href="/Music_genre" .

 @artistPage.css("table th > a[href='/wiki/Music_genre']").xpath("..").css("+ td a")

This will return all <a> for each list of genres.

Now for a good measure, let me grab the inner text of all these <a> and put them in an array.

 @genreLinks = @artistPage.css("table th > a[href='/wiki/Music_genre']").xpath("..").css("+ td a") @genres = [] @genreLinks.each do |genreLink| @genres.push(genreLink.text) end

0

musophob Oct 14 '13 at 18:41

source share

Phrogz · Accepted Answer · 2012-08-01T17:36:48+0000

The problem with the current implementation of Nokogiri :has() is that it creates XPath, which requires the content to be a direct descendant, not a descendant

 puts Nokogiri::CSS.xpath_for( "a:has(b)" ) #=> "//a[b]" #=> Should output "//a[.//b]" to be correct

For this XPath to match what jQuery does, you must allow span be a streaming element. For instance:

 require 'nokogiri' d = Nokogiri.XML('<r><a/><a><b><c/></b></a></r>') d.at_css('a:has(b)') #=> #<Nokogiri::XML::Element:0x14dd608 name="a" children=[#<Nokogiri::XML::Element:0x14dd3e0 name="b" children=[#<Nokogiri::XML::Element:0x14dd20c name="c">]>]> d.at_css('a:has(c)') #=> nil d.at_xpath('//a[.//c]') #=> #<Nokogiri::XML::Element:0x14dd608 name="a" children=[#<Nokogiri::XML::Element:0x14dd3e0 name="b" children=[#<Nokogiri::XML::Element:0x14dd20c name="c">]>]>

For your specific case, here is the full “broken" XPath:

 puts Nokogiri::CSS.xpath_for( "li:has(span.string:not(:empty)) > h1 > a" ) #=> //li[span[contains(concat(' ', @class, ' '), ' string ') and not(not(node()))]]/h1/a

And here is fixed:

 # Adding just the .// //li[.//span[contains(concat(' ', @class, ' '), ' string ') and not(not(node()))]]/h1/a # Simplified to assume only one CSS class is present on the span //li[.//span[@class='string' and not(not(node()))]]/h1/a # Assuming that `not(:empty)` really meant "Has some text in it" //li[.//span[@class='string' and text()]]/h1/a # ..or maybe you really wanted "Has some text anywhere underneath" //li[.//span[@class='string' and .//text()]]/h1/a # ..or maybe you really wanted "Has at least one element child" //li[.//span[@class='string' and *]]/h1/a

: has a pseudo CSS class in Nokigiri

More articles: