I am trying to extract data from an RSS feed at the following URL http://live.reuters.com/Event/rss.aspx?id=70335 . Essentially, I want to extract the title and date of each message using the code indicated in this message: Parse the RSS feed using the XML package R
Code itself
library(XML) library(RCurl) ###Extracting Data from Reuters xml.url <- "http://live.reuters.com/Event/rss.aspx?id=70335" script <- getURL(xml.url) doc <- xmlParse(script) titles <- xpathSApply(doc,'//item/title',xmlValue) pubdates <- xpathSApply(doc,'//item/pubDate',xmlValue) reuters<-cbind(titles, pubdates) reuters_data<-data.frame(reuters) #Exporting as a csv write.csv(reuters_data, file = "reuters_post.csv")
The code does almost what I want. However, the problem I am facing is that it only retrieves the first 45 messages. I know that is closer to 1000 posts. Is this related to the rss.aspx format? Is there a workaround so that I can get all the messages in the RSS feed, not just the first 45? Any help would be greatly appreciated since I am new to data cleansing.
Thank you Thomas
source share