R: Getting the sum of the columns in the data.frame group by a specific column

I have a sample data.frame as shown below, I want to create another data.frame file that contains the statistics of this table with a specific column, how can I do this?

Like, for example, in data.frame below, I like to get the sum of each column in a chart.

Example data.frame:

Chart    Sum     Sum_Squares    Count     Average
Chart1   2           4            4         1
Chart1   3           9            3         1.5
Chart2   4           16           5         2
Chart2   5           25           2         2.5

Required Conclusion:

Chart    Sum_sum      Sum_square_sum      Count_sum      Average_sum
Chart1      5              13                 7              2.5
Chart2      9              41                 7              4.5

I tried the code below, but the return table contains only Chart and V1. sum_stat is data.frame

  sum_stat = data.table(spc_point[,c("CHART", "SUM", "SUM_SQUARES", "COUNT", "AVERAGE")])[,c(SUM_SUM=sum(SUM), SUM_SQUARE_SUM=sum(SUM_SQUARES), COUNT_SUM=sum(COUNT), AVERAGE_SUM=sum(AVERAGE)),by=list(CHART)]

thanks in advance

+4
source share
3 answers

You can consider dplyr. Suppose that dfis your data frame, the following will produce the desired result.

library(dplyr)
df %.% group_by(Chart) %.% 
    summarise(Sum=sum(Sum), 
              Sum_Squares = sum(Sum_Squares), 
              Count= sum(Count),
              Average= sum(Average))

data.table :

dt = as.data.table(df)
dt[, list(Sum=sum(Sum), 
          Sum_Squares = sum(Sum_Squares), 
          Count= sum(Count),
          Average= sum(Average)),
   by=Chart]
+3

data.table. :

data<-data.table("Chart"=c("Chart1","Chart1","Chart2","Chart2"), "Sum"=c(2,3,4,5),"Sum_Squares"=c(4,9,16,25),"Count"=c(4,3,5,2),"Average"=c(1,1.5,2,2.5),key="Chart")

:

summed.data<-data[,lapply(.SD,sum),by=Chart]

data.table, faq - :)

+6

In the database R:

aggregate(df[,2:5],by=list(df$Chart),FUN=sum)
#   Group.1 Sum Sum_Squares Count Average
# 1  Chart1   5          13     7     2.5
# 2  Chart2   9          41     7     4.5

As @AnandaMahto points out, the formula syntax for is aggregate(...)simpler and cleaner.

aggregate(. ~ Chart, df, sum)
#    Chart Sum Sum_Squares Count Average
# 1 Chart1   5          13     7     2.5
# 2 Chart2   9          41     7     4.5
+2
source

Source: https://habr.com/ru/post/1530551/


All Articles