Create a variable that captures the most common groups

Definition:

df1 <-data.frame( id=c(rep(1,3),rep(2,3)), v1=as.character(c("a","b","b",rep("c",3))) ) 

st

 > df1 id v1 1 1 a 2 1 b 3 1 b 4 2 c 5 2 c 6 2 c 

I want to create a third freq variable that contains the most frequent observation in v1 on id st

 > df2 id v1 freq 1 1 ab 2 1 bb 3 1 bb 4 2 cc 5 2 cc 6 2 cc 
+6
source share
3 answers

You can do this with ddply and a special function to select the most common value:

 myFun <- function(x){ tbl <- table(x$v1) x$freq <- rep(names(tbl)[which.max(tbl)],nrow(x)) x } ddply(df1,.(id),.fun=myFun) 

Note that which.max will return the first occurrence of the maximum value in the case of associations. Cm.?? which.is.max in the nnet package for an option that randomly disconnects.

+3
source
 mode <- function(x) names(table(x))[ which.max(table(x)) ] df1$freq <- ave(df1$v1, df1$id, FUN=mode) > df1 id v1 freq 1 1 ab 2 1 bb 3 1 bb 4 2 cc 5 2 cc 6 2 cc 
+1
source

Another way is to use tidyverse functions:

  • grouping first using group_by() , and counting the appearance of the second variable with tally()
  • sorted by number of occurrences using arrange()
  • summarizing and selecting the first row using summarize() and first()

Thus:

 df1 %>% group_by(id, v1) %>% tally() %>% arrange(id, desc(n)) %>% summarize(freq = first(v1)) 

This will only give you a comparison (which I find cleaner):

 # A tibble: 2 x 2 id freq <dbl> <fctr> 1 1 b 2 2 c 

Then you can left_join create the original data frame with this table.

+1
source

Source: https://habr.com/ru/post/891595/


All Articles