How can I extract information from an xml page using R

Question

How can I extract information from an xml page using R

I am trying to get all the information from this page: http://ws.parlament.ch/affairs/19110758/?format=xml

First, I upload the file to fileand parse it with xmlParse(file).

download.file(url = paste0(http://ws.parlament.ch/affairs/19110758/?format=xml), destfile = destfile)
file <- xmlParse(destfile[])

Now I want to extract all the information I need. For example, heading and identification number. I tried something like this:

title <- xpathSApply(file, "//h2", xmlValue)

But this only gives me an error: unable to find an inherited method for function ‘saveXML’ for signature ‘"XMLDocument"

The next thing I tried is:

library(plyr)

test <-ldply(xmlToList(file), function(x) { data.frame(x[!names(x)=="id"]) } )

This gives me data.framesome information. But I am losing information such as id(most importantly).

I would like to get data.framewith a line (just one line per case) containing all the information of one case, for example id``updated additionalIndexing``affairType, etc.

It works with this (example for id):

infofile <- xmlRoot(file)

nodes <-  getNodeSet(file, "//affair/id")
id <-as.numeric(lapply(nodes, function(x) xmlSApply(x, xmlValue)))

+4

xml r xml-parsing

Thomas 28 . '14 15:45

2

HTML , XML . htmlParse:

destfile <- tempfile() # make this example copy-pasteable
download.file(url = "http://ws.parlament.ch/affairs/19110758/?format=xml", destfile = destfile)
file <- htmlParse(destfile)
title <- xpathSApply(file, '//h2')
xmlValue(title[[1]])
# [1] "Heilmittelwesen. Gesetzgebung"

+4

Robert Krzyzanowski 28 . '14 15:55

hrbrmstr · Accepted Answer · 2014-03-28T17:01:25+0000

XML:

library(XML)
library(RCurl)
library(httr)

srcXML <- getURL("http://ws.parlament.ch/affairs/19110758/?format=xml", 
            .opts=c(user_agent("Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"),
              verbose()))

myXMLFile <- xmlTreeParse(substr(srcXML,4,nchar(srcXML)))

GET() httr, , , user-agent ( , -, , ). substr(), , xmlTreeParse().

How can I extract information from an xml page using R

More articles: