I am clearing the XML in R using xpathSApply (in the XML package) and having difficulty retrieving the attributes.
First, the corresponding XML fragment:
<div class="offer-name"> <a href="http://www.somesite.com" itemprop="name">Fancy Product</a> </div>
I successfully pulled out the βFancy Productβ (ie item?) Using:
Products <- xpathSApply(parsedHTML, "//div[@class='offer-name']", xmlValue)
It took some time (I n00b), but the documentation is good and there are some answers to the questions that I could use. I can't figure out how to pull out http://www.somesite.com because of (attribute?). I assumed that this is due to a change in the 3rd term from "xmlValue" to "xmlGetAttr", but I could completely disconnect.
FYI (1) There are 2 more parent elements, div> above the fragment that I pasted, and (2) here is the abbreviated full-ish code (which, I believe, is not relevant, but included for completeness):
library(XML) library(httr) content2 = paste(readLines(file.choose()), collapse = "\n")
source share