(I hope this is not a violation of etiquette: I posted this on RailsForum, but I didn’t get a great response from this.)
Has anyone else had problems with Mechanize not recognizing anchor labels using CSS selectors?
HTML looks like this (snippet with removed space for clarity):
<td class='calendarCell' align='left'> <a href="http://www.mysite.org/index.php/site/ActivitiesCalendar/2010/02/10/">10</a> <p style="margin-bottom:15px; line-height:14px; text-align:left;"> <span class="sidenavHeadType"> Current Events</span><br /> <b><a href="http://www.mysite.org/index.php/site/ Clubs/banks_and_the_fed" class="a2">Banks and the Fed</a></b> <br /> 10:30am- 11:45am </p>
I am trying to collect data from these events. Everything works, except for getting the binding in <p> . There is clearly an <a> tag inside <b> , and I will need to follow this link to get more information about this event.
In my rake solution, I have:
agent.page.search(".calendarCell,.calendarToday").each do |item| day = item.at("a").text item.search("p").each do |e| anchor = e.at("a") puts anchor puts e.inner_html end end
Interestingly, item.at ("a") always returns an anchor. But e.at ("a") returns nil. And when I do inner_html on the p element, it completely ignores the anchor. Output Example:
nil <span class="sidenavHeadType"> Photo Club</span><br><b>Indexing Slide Collections</b> <br> 2:00pm- 3:00pm
However, when I launch the same scratch directly with Nokogiri:
doc.css(".calendarCell,.calendarToday").each do |item| day = item.at_css("a").text item.css("p").each do |e| link = e.at_css("a")[:href] puts e.inner_html end end
He recognizes inside
and it will return href etc.
<span class="sidenavHeadType"> Bridge Party</span><br><b><a href="http://www.mysite.org/index.php/site/Clubs/party_bridge_51209" class="a2">Party Bridge</a></b> <br> 7:00pm- 9:00pm
The mechanism should use Nokogiri, so I'm wondering if I have a bad version or if it affects others.
Thanks for any conclusions.