Extract additional posts from RSS feed to CSV using R

I am trying to extract data from an RSS feed at the following URL http://live.reuters.com/Event/rss.aspx?id=70335 . Essentially, I want to extract the title and date of each message using the code indicated in this message: Parse the RSS feed using the XML package R

Code itself

library(XML) library(RCurl) ###Extracting Data from Reuters xml.url <- "http://live.reuters.com/Event/rss.aspx?id=70335" script <- getURL(xml.url) doc <- xmlParse(script) titles <- xpathSApply(doc,'//item/title',xmlValue) pubdates <- xpathSApply(doc,'//item/pubDate',xmlValue) reuters<-cbind(titles, pubdates) reuters_data<-data.frame(reuters) #Exporting as a csv write.csv(reuters_data, file = "reuters_post.csv") 

The code does almost what I want. However, the problem I am facing is that it only retrieves the first 45 messages. I know that is closer to 1000 posts. Is this related to the rss.aspx format? Is there a workaround so that I can get all the messages in the RSS feed, not just the first 45? Any help would be greatly appreciated since I am new to data cleansing.

Thank you Thomas

+4
source share
1 answer

It addresses the issue with RSS / Atom feeds that do not allow historical information to be retrieved; see How do I get all the old items on an RSS feed?

However, we can use the unofficial Google Reader API GoogleReaderAPI Wiki .

 library(RCurl) library(RJSONIO) N <- 100 # Number of items to fetch url <- paste("http://www.google.com/reader/api/0/stream/contents/feed/http://live.reuters.com/Event/rss.aspx%3Fid=70335?n=", N, sep="") json <- getURL(url) # Fetches data list <- fromJSON(json) # JSON to list df <- as.data.frame(do.call(rbind, list$items)) # list to data.frame title <- unlist(df$title) # Title datetime <- as.POSIXlt(unlist(df$published), origin="1970-01-01", tz="GMT") # Publication date reuters <- data.frame(title, datetime) # Output data.frame write.csv(reuters, file = "reuters_post.csv") # Writes CSV 
+1
source

Source: https://habr.com/ru/post/1447590/


All Articles