Create a variable that captures the most common groups

Question

Create a variable that captures the most common groups

Definition:

df1 <-data.frame( id=c(rep(1,3),rep(2,3)), v1=as.character(c("a","b","b",rep("c",3))) )

st

 > df1 id v1 1 1 a 2 1 b 3 1 b 4 2 c 5 2 c 6 2 c

I want to create a third freq variable that contains the most frequent observation in v1 on id st

 > df2 id v1 freq 1 1 ab 2 1 bb 3 1 bb 4 2 cc 5 2 cc 6 2 cc

+6

r count data-manipulation frequency data-management

Fred Jun 28 '11 at 9:38

source share

3 answers

 mode <- function(x) names(table(x))[ which.max(table(x)) ] df1$freq <- ave(df1$v1, df1$id, FUN=mode) > df1 id v1 freq 1 1 ab 2 1 bb 3 1 bb 4 2 cc 5 2 cc 6 2 cc

+1

42- Jun 28 '11 at 10:04

source share

Another way is to use tidyverse functions:

grouping first using group_by() , and counting the appearance of the second variable with tally()
sorted by number of occurrences using arrange()
summarizing and selecting the first row using summarize() and first()

Thus:

 df1 %>% group_by(id, v1) %>% tally() %>% arrange(id, desc(n)) %>% summarize(freq = first(v1))

This will only give you a comparison (which I find cleaner):

 # A tibble: 2 x 2 id freq <dbl> <fctr> 1 1 b 2 2 c

Then you can left_join create the original data frame with this table.

+1

slhck Nov 14 '17 at 11:39

source share

joran · Accepted Answer · 2011-06-28T21:51:40+0000

You can do this with ddply and a special function to select the most common value:

 myFun <- function(x){ tbl <- table(x$v1) x$freq <- rep(names(tbl)[which.max(tbl)],nrow(x)) x } ddply(df1,.(id),.fun=myFun)

Note that which.max will return the first occurrence of the maximum value in the case of associations. Cm.?? which.is.max in the nnet package for an option that randomly disconnects.

Create a variable that captures the most common groups

More articles: