Divide columns by groups

I have some data that look something like this:

test.frame <- read.table(text = "name amounts JEAN 318.5,45 GREGORY 1518.5,67,8 WALTER 518.5 LARRY 518.5,55,1 HARRY 318.5,32 ",header = TRUE,sep = "") 

I would like it to look more ...

 name amount JEAN 318.5 JEAN 45 GREGORY 1518.5 GREGORY 67 GREGORY 8 WALTER 518.5 LARRY 518.5 LARRY 55 LARRY 1 HARRY 318.5 HARRY 32 

There seems to be an easy way to break the β€œsum” column, but I haven't come up with this. We will be happy to answer the "RTFM page for this particular team." Which team am I looking for?

+4
source share
5 answers
 (test.frame <- read.table(text = "name amounts JEAN 318.5,45 GREGORY 1518.5,67,8 WALTER 518.5 LARRY 518.5,55,1 HARRY 318.5,32 ",header = TRUE,sep = "")) # name amounts # 1 JEAN 318.5,45 # 2 GREGORY 1518.5,67,8 # 3 WALTER 518.5 # 4 LARRY 518.5,55,1 # 5 HARRY 318.5,32 tmp <- setNames(strsplit(as.character(test.frame$amounts), split = ','), test.frame$name) data.frame(name = rep(names(tmp), sapply(tmp, length)), amounts = unlist(tmp), row.names = NULL) # name amounts # 1 JEAN 318.5 # 2 JEAN 45 # 3 GREGORY 1518.5 # 4 GREGORY 67 # 5 GREGORY 8 # 6 WALTER 518.5 # 7 LARRY 518.5 # 8 LARRY 55 # 9 LARRY 1 # 10 HARRY 318.5 # 11 HARRY 32 
+5
source

The fastest way (possibly) would be data.table

 library(data.table) setDT(test.frame)[, lapply(.SD, function(x) unlist(strsplit(as.character(x), ','))), .SDcols = "amounts", by = name] ## name amounts ## 1: JEAN 318.5 ## 2: JEAN 45 ## 3: GREGORY 1518.5 ## 4: GREGORY 67 ## 5: GREGORY 8 ## 6: WALTER 518.5 ## 7: LARRY 518.5 ## 8: LARRY 55 ## 9: LARRY 1 ## 10: HARRY 318.5 ## 11: HARRY 32 
+5
source

A generalization of David Arenburg's solution would be to use the cSplit function. Get it from the Git Hub Gist ( https://gist.github.com/mrdwab/11380733 ) or download it using "devtools":

 # library(devtools) # source_gist(11380733) 

The "long" format will be what you are looking for ...

 cSplit(test.frame, "amounts", ",", "long") # name amounts # 1: JEAN 318.5 # 2: JEAN 45 # 3: GREGORY 1518.5 # 4: GREGORY 67 # 5: GREGORY 8 # 6: WALTER 518.5 # 7: LARRY 518.5 # 8: LARRY 55 # 9: LARRY 1 # 10: HARRY 318.5 # 11: HARRY 32 

But the function can also create wide output formats:

 cSplit(test.frame, "amounts", ",", "wide") # name amounts_1 amounts_2 amounts_3 # 1: JEAN 318.5 45 NA # 2: GREGORY 1518.5 67 8 # 3: WALTER 518.5 NA NA # 4: LARRY 518.5 55 1 # 5: HARRY 318.5 32 NA 

One of the advantages of this function is to split multiple columns at once.

+4
source

This is not a super-standard format, but here you can convert your data. First, I would use stringsAsFactors=F with your read.table to make sure everything is a character variable, not a factor. Alternatively, you can do as.character() in these columns.

First I separate the values ​​in sums with a comma, then I combine the values ​​with a column of names

 md <- do.call(rbind, Map(cbind, test.frame$name, strsplit(test.frame$amounts, ","))) 

Then I insert everything back and send it to read.table to convert the variable

 read.table(text=apply(md,1,paste, collapse="\t"), sep="\t", col.names=names(test.frame)) 

Alternatively, you can simply make data.frame from the md matrix and do the class conversions yourself

 data.frame(names=md[,1], amount=as.numeric(md[,2])) 
+1
source

Here is the plyr solution:

 Split.Amounts <- function(x) { amounts <- unlist(strsplit(as.character(x$amounts), ",")) return(data.frame(name = x$name, amounts = amounts, stringsAsFactors=FALSE)) } library(plyr) ddply(test.frame, .(name), Split.Amounts) 

Using dplyr :

 library(dplyr) test.frame %>% group_by(name) %>% do(Split.Amounts(.)) 
+1
source

Source: https://habr.com/ru/post/970915/


All Articles