Load the xml "rows" into the R data table

I have some data in the form:

<people> <person first="Mary" last="Jane" sex="F" /> <person first="Susan" last="Smith" sex="F" height="168" /> <person last="Black" first="Joseph" sex="M" /> <person first="Jessica" last="Jones" sex="F" /> </people> 

I need a data frame that looks like this:

  first last sex height 1 Mary Jane F NA 2 Susan Smith F 168 3 Joseph Black M NA 4 Jessica Jones F NA 

I got this far:

 library(XML) xpeople <- xmlRoot(xmlParse(xml)) lst <- xmlApply(xpeople, xmlAttrs) names(lst) <- 1:length(lst) 

But I can’t understand for life how to get a list in a data frame. I can get a β€œsquare” list (i.e., fill in the blanks) and then put it in a data frame:

 lst <- xmlApply(xpeople, function(node) { attrs = xmlAttrs(node) if (!("height" %in% names(attrs))) { attrs[["height"]] <- NA } attrs }) df = as.data.frame(lst) 

But I have the following problems:

  • Data frame migrated
  • the first and last are factors, not chr
  • height is a factor, not a number.
  • the first and last names have been swapped for Joseph Black (not a big problem, since my data is usually consistent, but annoying nonetheless)

How can I get the data frame in the correct form?

+5
source share
1 answer
 txt <- '<people> <person first="Mary" last="Jane" sex="F" /> <person first="Susan" last="Smith" sex="F" height="168" /> <person last="Black" first="Joseph" sex="M" /> <person first="Jessica" last="Jones" sex="F" /> </people>' library(XML) # for xmlTreeParse library(data.table) # for rbindlist(...) xml <- xmlTreeParse(txt, asText=TRUE, useInternalNodes = TRUE) rbindlist(lapply(xml["//person"],function(x)as.list(xmlAttrs(x))),fill=TRUE) # first last sex height # 1: Mary Jane F NA # 2: Susan Smith F 168 # 3: Joseph Black M NA # 4: Jessica Jones F NA 

You need as.list(xmlAttrs(...)) instead of xmlAttrs(...) , because rbindlist(...) wants each argument to be a list, not a vector.

+3
source

Source: https://habr.com/ru/post/1232629/


All Articles