RCurl or XML Challenge: Read Pastebin in R

Question

RCurl or XML Challenge: Read Pastebin in R

Combine the RCURL / XML muscle. The shortest code wins. Parsing in R: http://pastebin.com/CDzYXNbG

Data must be:

structure(list(Treatment = structure(c(2L, 2L, 1L, 1L), .Label = c("C", "T"), class = "factor"), Gender = c("M", "F", "M", "F"), Response = c(56L, 58L, 6L, 63L)), .Names = c("Treatment", "Gender", "Response"), row.names = c(NA, -4L), class = "data.frame")

Good luck

Note: dummy data kindly provided by this question: Adding space between columns in ggplot2

+6

r

Brandon bertelsen May 22 '11 at 7:00

source share

4 answers

RCURL is not needed for my code, as XML packages can parse the URL for a file argument.

Please follow

 library(XML)

before the examples below.

Code 1 - oneliner :

 eval(parse(text=htmlTreeParse("http://pastebin.com/CDzYXNbG",handlers=(function(){qt <- NULL;list(textarea=function(node,...){qt<<-gsub("[\r\n]", "", unclass(node$children$text)$value);node},.qt=function()qt)})())$.qt()))

Code 2 is shorter, but I think this is not the shortest.

 htmlTreeParse("http://pastebin.com/CDzYXNbG",h=list(textarea=function(n)z<<-gsub("[\r\n]","",unclass(n$c$t)$v)));eval(parse(text=z))

Since this question is a kind of game, please decrypt this code.

UPDATED

Looking at the excellent @JD Long solution, here is the shortest code:

 eval(parse(file(sub("m/","m/raw.php?i=","http://pastebin.com/CDzYXNbG"))))

Now the question is how to make the desired url string in the shortest code; -p

Updated again. This is somewhat shorter.

 source(sub("m/","m/raw.php?i=","http://pastebin.com/CDzYXNbG"))$va

+4

kohske May 22, '11 at 13:20

source share

You guys are doing it too hard:

eval(parse(file("http://pastebin.com/raw.php?i=CDzYXNbG")))

Ok, so I cheated. But starting from the same URL you can get the same end:

eval(parse(file(paste("http://pastebin.com/raw.php?i=", strsplit("http://pastebin.com/CDzYXNbG", "/")[[1]][4], sep=""))))

Which still puts me ahead :)

+4

Jd long May 22, '11 at 21:38

source share

I'm not quite sure what you are trying to achieve here, but maybe it does what you ask for (not using fancy packages, just a regular expression):

 fullText<-(paste(readLines("http://pastebin.com/CDzYXNbG"), collapse="\n")) regexp<-"<textarea[^>]*id=\"paste_code\"[^>]*>(.*)</textarea>" txtarpos<-regexpr(regexp, fullText) txtarstrt<-txtarpos[1] txtarlen<-unlist(attributes(txtarpos)["match.length"]) txtarstp<-txtarstrt+txtarlen txtarpart<-substr(fullText, txtarpos[1], txtarstp) retval<-gsub("\n", "", gsub("&quot;", "\"", gsub(regexp, "\\1", txtarpart), fixed=TRUE), fixed=TRUE) cat(retval)

I am also sure that this can be improved somewhat, but it does the work that I think you requested. Even if it is not: thanks for wanting to update my regular expressions!

+1

Nick sabbe May 22 '11 at 12:35

source share

cameron.bracken · Accepted Answer · 2011-05-22T17:28:46+0000

Same as kokhske, but a little shorter and clearer. I think

 library(XML) eval(parse(text=gsub('\r\n','\n',xpathApply(htmlTreeParse('http://pastebin.com/CDzYXNbG',useInternal=T),'//textarea',xmlValue))))

RCurl or XML Challenge: Read Pastebin in R

More articles: