Expanding the data frame for monthly income totals for all unique column values ​​in R columns

I have a df that has data like this:

sub = c("X001","X002", "X001","X003","X002","X001","X001","X003","X002","X003","X003","X002") month = c("201506", "201507", "201506","201507","201507","201508", "201508","201507","201508","201508", "201508", "201508") tech = c("mobile", "tablet", "PC","mobile","mobile","tablet", "PC","tablet","PC","PC", "mobile", "tablet") brand = c("apple", "samsung", "dell","apple","samsung","apple", "samsung","dell","samsung","dell", "dell", "dell") revenue = c(20, 15, 10,25,20,20, 17,9,14,12, 9, 11) df = data.frame(sub, month, brand, tech, revenue) 

I want to use sub and month as a key and get one line for each subscriber per month, which displays the amount of income for unique values ​​in the technology and the brand for this subscriber for this month. This example is simplified and with fewer columns, since I have a huge data set, I decided to try to do this using data.table .

I managed to do this for a single column column, be it a technique or a brand, using this:

 df1 <- dcast(df, sub + month ~ tech, fun=sum, value.var = "revenue") 

but I want to do this for two or more caqtogorical columns while I tried this:

 df2 <- dcast(df, sub + month ~ tech+brand, fun=sum, value.var = "revenue") 

and it just concatenates the unique values ​​of both the column columns and the sum for that, but I don't want that. I separate separate columns for each unique value of all columns.

I am new to R and would really appreciate any help.

+5
source share
1 answer

(I will assume that df is data.table rather a data.frame , as in your example.)

One of the possible solutions for this is to first melt data, storing sub , month and revenue as keys. Thus, brand and tech will be converted into one variable with a value corresponding to each existing key combination. Thus, we can easily dcast return it, since we will work against a single column, as in the first example

 dcast(melt(df, c(1:2, 5)), sub + month ~ value, sum, value.var = "revenue") # sub month PC apple dell mobile samsung tablet # 1: X001 201506 10 20 10 20 0 0 # 2: X001 201508 17 20 0 0 17 20 # 3: X002 201507 0 0 0 20 35 15 # 4: X002 201508 14 0 11 0 14 11 # 5: X003 201507 0 25 9 25 0 9 # 6: X003 201508 12 0 21 9 0 0 

According to OPs comment, you can easily add a prefix by adding also the variable formula to the formula. That way the column will also be ordered correctly

 dcast(melt(df, c(1:2, 5)), sub + month ~ variable + value, sum, value.var = "revenue") # sub month brand_apple brand_dell brand_samsung tech_PC tech_mobile tech_tablet # 1: X001 201506 20 10 0 10 20 0 # 2: X001 201508 20 0 17 17 0 20 # 3: X002 201507 0 0 35 0 20 15 # 4: X002 201508 0 11 14 14 0 11 # 5: X003 201507 25 9 0 0 25 9 # 6: X003 201508 0 21 0 12 9 0 
+5
source

Source: https://habr.com/ru/post/1259450/


All Articles