How to get a summary of each unique identifier

I would like to extract some summary statistics for multiple values ​​in multiple columns. My data is as follows

id pace type value abundance 51 (T) (JC) (L) 0 51 (T) (JC) (L) 0 51 (T) (JC) (H) 0 52 (T) (JC) (H) 0 52 (R) (JC) (H) 0 53 (T) (JC) (L) 1 53 (T) (JC) (H) 1 53 (R) (JC) (H) 1 53 (R) (JC) (H) 1 53 (R) (JC) (H) 1 54 (T) (BC) <blank> 0 54 (T) (BC) <blank> 0 54 (T) (BC) <blank> 0 

and I hope for something like this

 id ptype (T) (R) (L) (H) abundance 51 (JC) 3 0 2 1 0 52 (JC) 1 1 0 2 0 53 (JC) 2 3 1 4 1 54 (BC) 3 0 0 0 0 

I started writing code:

 for (i in levels(df$id)) { extract.event <- df[df$id==i,]# To identify each section ppace <- table(extract.event$pace) #count table of pace ptype <- extract.event$type[1] # extract the first line to be the type nvalues <- table(extract.event$value) #count table of value nabundance <- min(extract.event$abundance) #minimum of abundance d <- cbind(ppace,ptype,forbeh,nvalues,nabundance) 

but I am facing merging issues, especially when nabundance displays an empty table. I would prefer not to extract the name by name, since there are so many names in the data frame. Any ideas? I thought this might be due to the plyr package, but still not sure ...

Thanks,

Grace

+5
source share
1 answer

I had to rewrite the data.frame file (for future use, insert the dput results because we hate overwriting your data), but here is my attempt. I assume that you are looking for something along the lines of an aggregate function:

 df <- data.frame(id = as.factor(c(51,51,51,52,52,53,53,53,53,53,54,54,54)), pace = c("(T)","(T)","(T)","(T)","(R)","(T)","(T)","(R)","(R)","(R)","(T)","(T)","(T)"), type = c("(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(BC)","(BC)","(BC)"), value = c("(L)","(L)","(H)","(H)","(H)","(L)","(H)","(H)","(H)","(H)","<blank>","<blank>","<blank>"), abundance = c(0,0,0,0,0,1,1,1,1,1,0,0,0)) smallnames <- colnames(do.call("cbind",as.list(aggregate(cbind(value, pace, abundance) ~ id + type, data = lapply(df, as.character), table)))) smallnames [1] "id" "type" "(H)" "(L)" "<blank>" "(R)" "(T)" "0" [9] "1" df.new <- do.call("data.frame", as.list(aggregate(cbind(value, pace, abundance) ~ id + type, data = lapply(df, as.character), table))) colnames(df.new) <- smallnames df.new$abundance <- df.new$`1` df.new id type (H) (L) <blank> (R) (T) 0 1 abundance 1 54 (BC) 0 0 3 0 3 3 0 0 2 51 (JC) 1 2 0 0 3 3 0 0 3 52 (JC) 2 0 0 1 1 2 0 0 4 53 (JC) 4 1 0 3 2 0 5 5 df.final <- df.new[, -which(colnames(df.new) %in% c("<blank>","0","1"))] df.final id type (H) (L) (R) (T) abundance 1 54 (BC) 0 0 0 3 0 2 51 (JC) 1 2 0 3 0 3 52 (JC) 2 0 1 1 0 4 53 (JC) 4 1 3 2 5 

Let me know if this is what you are looking for, or if you have a problem with it.

+3
source

Source: https://habr.com/ru/post/1267206/


All Articles