Calculate the difference between the observations between the groups depending on the values ​​that contain the groups

I have the following two data frames in a list. For each data frame, I would like to calculate the difference between the number of observations of a group (indicated by a “type”) that contains the maximum value (“value”) and the number of observations of another group.

So, for df1 this will be 3 - 6 = -3, since type B contains a maximum value of 7, and for type B and 6 observations there are 3 observations for type A.

value <- c(1, 2, 3, 4, 5, 6, 1, 2, 7)
type  <- c("A", "A", "A", "A", "A", "A", "B", "B", "B")
df1   <- data.frame(value, type)

value <- c(1, 2, 3, 4, 6, 1, 2)
type  <- c("A", "A", "A", "A", "A", "B", "B")
df2   <- data.frame(value, type)

mylist <- list(df1, df2)

I think this would be the next line combined with length(unique())and max(), but I can’t figure it out.

calculation <- lapply(mylist, function (x) 
{x$#the count of observations of the type that includes the max value#) - (x$#the count of the observations of the type that does not include the max value)})
+4
source share
3 answers

, , :

[number in group] - [number not in group]
= [number in group] - ([number of rows] - [number in group])
= [number in group] - [number of rows] + [number in group]
= 2 * [number in group] - [number of rows]

:

lapply(mylist, function(x) {2*sum(x$type==x$type[which.max(x$value)])-nrow(x)})

:

[[1]]
[1] -3

[[2]]
[1] 3

, !

+3

lapply(mylist, function(x){ 
  x[,"value"] <- as.numeric(x[,"value"])
  MAX_FLAG <- which(x[,"value"] == max(x[,"value"]))[1]
  MAX_FLAG <- x[MAX_FLAG,"type"]
  A <- length(which(x[,"type"] == "A" ))
  B <- length(which(x[,"type"] == "B" ))
  BA <- ifelse( MAX_FLAG == "B",B-A,A-B)
  return(BA)
 }
)

2 (, A B)

, ,

Gottavianoni

+2

You can also use the aggregate to count the number of observations in each group:

calculations <- lapply(mylist, function(df) {
  sum_df <- aggregate(value~type, df, FUN = length)
  max_type <- df$type[which.max(df$value)]
  sum_df$value[sum_df$type == max_type] - sum_df$value[sum_df$type != max_type]
})
+1
source

Source: https://habr.com/ru/post/1696311/


All Articles