Calculate group summary and return value in data frame

df <- data.frame(
id = c('A1','A2','A4','A2','A1','A4','A3','A2','A1','A3'),
value = c(4,3,1,3,4,6,6,1,8,4))

I want to get the maximum value in each id group. I tried to follow but got an error saying that the replacement has 4 rows and the data has 10 which I understand but don’t know how to fix

df$max.by.id <- aggregate(value ~ id, df, max)  

this is how i did it successfully

max.by.id <- aggregate(value ~ id, df, max)  
names(max.by.id) <- c("id", "max")
df2 <- merge(df,max.by.id, by.x = "id", by.y = "id")
df2
#   id value max
#1  A1     4   8
#2  A1     4   8
#3  A1     8   8
#4  A2     3   3
#5  A2     3   3
#6  A2     1   3
#7  A3     6   6
#8  A3     4   6
#9  A4     1   6
#10 A4     6   6

is there any better way? thank you in advance

+4
source share
3 answers

ave() is a function for this task:

df$max.by.id <- ave(df$value, df$id, FUN=max) 

Example:

df <- data.frame(
  id = c('A1','A2','A4','A2','A1','A4','A3','A2','A1','A3'),
  value = c(4,3,1,3,4,6,6,1,8,4))

df$max.by.id <- ave(df$value, df$id, FUN=max) 

The result ave()is the same length as the original vector of values ​​(which is also the length of the grouping variables). The values ​​of the result move to the correct positions relative to the grouping variables. Read the documentation for more information ave().

+7
source

data.table, max id "" , ( id):

library(data.table)
setDT(df)[, max.by.id := max(value), by=id]
df
#    id value max.by.id
# 1: A1     4         8
# 2: A2     3         3
# 3: A4     1         6
# 4: A2     3         3
# 5: A1     4         8
# 6: A4     6         6
# 7: A3     6         6
# 8: A2     1         3
# 9: A1     8         8
#10: A3     4         6
+5
tapply(df$value, df$id, max)
# A1 A2 A3 A4 
  8  3  6  6 

library(plyr)
ddply(df, .(id), function(df){max(df$value)})
#   id V1
# 1 A1  8
# 2 A2  3
# 3 A3  6
# 4 A4  6

library(dplyr)
df %>% group_by(id) %>% arrange(desc(value)) %>% do(head(., 1))
# Source: local data frame [4 x 2]
# Groups: id [4]

#       id value
#   (fctr) (dbl)
# 1     A1     8
# 2     A2     3
# 3     A3     6
# 4     A4     6

UPDATE: If you need to save raw value, use the following code.

library(plyr)
ddply(df, .(id), function(df){
  df$max.val = max(df$value)
  return(df)
})

library(dplyr)
df %>% group_by(id) %>% mutate(max.val=max(value))
# Source: local data frame [10 x 3]
# Groups: id [4]

#        id value max.val
#    (fctr) (dbl)   (dbl)
# 1      A1     4       8
# 2      A2     3       3
# 3      A4     1       6
# 4      A2     3       3
# 5      A1     4       8
# 6      A4     6       6
# 7      A3     6       6
# 8      A2     1       3
# 9      A1     8       8
# 10     A3     4       6
+2
source

Source: https://habr.com/ru/post/1620574/


All Articles