R: subtract values ​​from rows based on another column

I have a dataset as follows:

Group    Type   Income
 A        X       1000
 A        Y       500
 B        Y       2000
 B        X       1500
 C        X       700
 D        Y       600

I need the output as follows:

Group    Diff
  A       500
  B      -500
  C       700
  D      -600

One approach I can think of is to split data with type X and Y, and then add revenue as 0 for groups where X or Y is not, and then merge the data, for example, for each group there is a column named IncomeX and the other, called IncomeY, then subtracting two columns.

Is there an easier way to do this?

+4
source share
3 answers

I would do it like this: (using the package dplyrand reshape2)

library("dplyr")
library("reshape2")

t <- read.table(text = "Group    Type   Income
 A        X       1000
                A        Y       500
                B        Y       2000
                B        X       1500
                C        X       700
                D        Y       600", header = TRUE)

t %>% 
    dcast(Group ~ Type, value.var = "Income", fill = 0) %>% 
    mutate(Diff = X - Y) %>% 
    select(Group, Diff)

# Group Diff
# 1     A  500
# 2     B -500
# 3     C  700
# 4     D -600

dcastChanges the format of the table, and mutatecreates a new column.

+4
source

Try this in the R database:

aggregate(Diff~Group, 
          with(df, data.frame(Group=Group, Diff=ifelse(Type=="X", 1, -1)*Income)), sum)

# Group Diff
#1     A    500
#2     B   -500
#3     C    700
#4     D   -600

<strong> data

df <- structure(list(Group = structure(c(1L, 1L, 2L, 2L, 3L, 4L), .Label = c("A", 
"B", "C", "D"), class = "factor"), Type = structure(c(1L, 2L, 
2L, 1L, 1L, 2L), .Label = c("X", "Y"), class = "factor"), Income = c(1000L, 
500L, 2000L, 1500L, 700L, 600L)), .Names = c("Group", "Type", 
"Income"), class = "data.frame", row.names = c(NA, -6L))
+3

We can use data.table. Convert "data.frame" to "data.table" ( setDT(df1)), for "Type" that is "Y", convert "Income" to negative values, then group by "Group", get sum'Income'.

library(data.table)
setDT(df1)[Type == "Y", Income := -1 * Income][, .(Diff= sum(Income))  , Group] 
#   Group Diff
#1:     A  500
#2:     B -500
#3:     C  700
#4:     D -600

Or using tidyr/dplyr

library(dplyr)
library(tidyr)
spread(df1, Type, Income, fill = 0) %>%
                transmute(Group, Diff = X- Y)
#    Group Diff
#1     A  500
#2     B -500
#3     C  700
#4     D -600
+1
source

Source: https://habr.com/ru/post/1658687/


All Articles