You can do this using the xmlToList function in the XML package, and since some of your nodes contain parameters that others do not, you will also need the rbind.fill function from the plyr package.
The line of code below converts your XML into a list, iterates over nodes and turns character strings into data frames, and then concatenates all of these data frames.
require(xml) require(plyr) out <- do.call("rbind.fill", lapply(xmlToList(users), function(x) as.data.frame(as.list(x), stringsAsFactors = FALSE))) head(out) Id Reputation CreationDate DisplayName LastAccessDate Location AboutMe Views UpVotes DownVotes 1 -1 1 2010-07-19T06:55:26.860 Community 2010-07-19T06:55:26.860 on the server farm some text 0 4382 771 2 2 101 2010-07-19T14:01:36.697 Geoff Dalgas 2012-09-13T17:41:48.300 Corvallis, OR some text 2 7 3 0 3 3 101 2010-07-19T15:34:50.507 Jarrod Dixon 2013-01-15T03:28:47.657 New York, NY some text 3 9 19 0 4 4 101 2010-07-19T19:03:27.400 Emmett 2013-04-16T16:51:04.780 New York, NY some text 4 3 0 0 5 5 6182 2010-07-19T19:03:57.227 Shane 2013-02-05T11:23:09.587 New York, NY some text 5 605 659 5 6 6 442 2010-07-19T19:04:07.647 Harlan 2013-05-09T13:11:29.027 District of Columbia some text 6 30 42 0 EmailHash WebsiteUrl Age 1 a007be5a61f6aa8f3e85ae2fc18dd66e <NA> <NA> 2 b437f461b3fd27387c5d8ab47a293d35 http://stackoverflow.com 36 3 2dfa19bf5dc5826c1fe54c2c049a1ff1 http://stackoverflow.com 34 4 129bc58fc3f1e3853cdd3cefc75fe1a0 http://minesweeperonline.com 27 5 0cee97ffd90277bf4ac753331d50af60 http://www.statalgo.com 34 6 9f1a68b9e623be5da422b44e733fa8bc http://www.harlan.harris.name 40
EDIT
The resulting data frame will consist entirely of character vectors. If you want to convert these vectors to dates, dates, numbers, etc., you either do it one by one, or you can write a function that indicates which classes should be assigned to columns with specific names, or you can write a function, to try to infer the correct class from the data. The following is an example of this last option:
giveClasses <- function(df, threshold = 0.1) { df_classes <- sapply(df, class) df_alpha <- sapply(df, function(x) { mean(grepl("[[:alpha:]]", x)) >= threshold}) & df_classes == "character" df_digits <- sapply(df, function(x) mean(grepl("\\d", x))) >= threshold & df_classes == "character" & !df_alpha df_percent <- sapply(df, function(x) mean(grepl("%", x))) >= threshold & df_classes == "character" & !df_alpha & df_digits df_digits[df_percent] <- FALSE df_decimal <- sapply(df, function(x) mean(grepl("\\.", x))) >= threshold & df_classes == "character" & !df_percent & df_digits & !df_alpha df_dates <- sapply(df, function(x) { mean(grepl( "^\\d{2,4}[[:punct:]]\\d{2}[[:punct:]]\\d{2,4}$", x)) >= threshold}) & df_classes == "character" df_datetime <- sapply(df, function(x) { mean(grepl( "^\\d{2,4}[[:punct:]]\\d{2}[[:punct:]]\\d{2,4}\\D\\d{2}:\\d{2}(:\\d{2})?(\\.\\d{1,})?$", x)) >= threshold}) & df_classes == "character" # convert character data to appropriate classes df_logical <- sapply(df, function(x) { y <- unique(na.omit(x)) length(y) == 2 & mean(grepl("^n", y, ignore.case = TRUE) | grepl("^y", y, ignore.case = TRUE)) == 1 }) df_digits[df_dates | df_datetime] <- FALSE df[,df_percent] <- lapply(df[,df_percent, drop = FALSE], function(x) { as.numeric(gsub("[^[:digit:].]", "", x)) / 100}) df[,df_logical] <- lapply(df[,df_logical, drop = FALSE], function(x) { x[grep("^y", x, ignore.case = TRUE)] <- TRUE x[grep("^n", x, ignore.case = TRUE)] <- FALSE as.logical(x) }) df[,df_decimal] <- lapply(df[,df_decimal, drop = FALSE], function(x) { as.numeric(gsub("[^[:digit:].]", "", x))}) df[,df_digits] <- lapply(df[,df_digits, drop = FALSE], function(x) { as.integer(gsub("[^[:digit:]]", "", x))}) df[,df_dates] <- lapply(df[,df_dates, drop = FALSE], function(x) { as.Date(x)}) df[,df_datetime] <- lapply(df[,df_datetime, drop = FALSE], function(x) { strptime(x, '%Y-%m-%dT%H:%M:%OS')}) df_ischaracter <- sapply(df, function(x) any(class(x) == "character")) df[,df_ischaracter] <- lapply(df[,df_ischaracter, drop = FALSE], function(x) { x <- gsub("^\\s+|\\s+$|(?<=\\s)\\s+", "", x, perl = TRUE)}) df }
The above function assigns a class to a column if more than 90% of the values ββin this column match the template corresponding to this class. Otherwise, it saves them as characters. It refers to patterns that are not found in your sample dataset. I just copied the code from another project I'm working on. So:
str(giveClasses(out)) 'data.frame': 17 obs. of 13 variables: $ Id : int 1 2 3 4 5 6 7 8 10 11 ... $ Reputation : int 1 101 101 101 6182 442 329 6104 121 136 ... $ CreationDate : POSIXlt, format: "2010-07-19 06:55:26" "2010-07-19 14:01:36" "2010-07-19 15:34:50" "2010-07-19 19:03:27" ... $ DisplayName : chr "Community" "Geoff Dalgas" "Jarrod Dixon" "Emmett" ... $ LastAccessDate: POSIXlt, format: "2010-07-19 06:55:26" "2012-09-13 17:41:48" "2013-01-15 03:28:47" "2013-04-16 16:51:04" ... $ Location : chr "on the server farm" "Corvallis, OR" "New York, NY" "New York, NY" ... $ AboutMe : chr "some text" "some text 2" "some text 3" "some text 4" ... $ Views : int 0 7 9 3 605 30 21 399 8 2 ... $ UpVotes : int 4382 3 19 0 659 42 14 576 2 10 ... $ DownVotes : int 771 0 0 0 5 0 0 18 0 0 ... $ EmailHash : chr "a007be5a61f6aa8f3e85ae2fc18dd66e" "b437f461b3fd27387c5d8ab47a293d35" "2dfa19bf5dc5826c1fe54c2c049a1ff1" "129bc58fc3f1e3853cdd3cefc75fe1a0" ... $ WebsiteUrl : chr NA "http://stackoverflow.com" "http://stackoverflow.com" "http://minesweeperonline.com" ... $ Age : int NA 36 34 27 34 40 27 35 43 39 ...