Expanding the data frame for monthly income totals for all unique column values in R columns

Question

Expanding the data frame for monthly income totals for all unique column values in R columns

I have a df that has data like this:

sub = c("X001","X002", "X001","X003","X002","X001","X001","X003","X002","X003","X003","X002") month = c("201506", "201507", "201506","201507","201507","201508", "201508","201507","201508","201508", "201508", "201508") tech = c("mobile", "tablet", "PC","mobile","mobile","tablet", "PC","tablet","PC","PC", "mobile", "tablet") brand = c("apple", "samsung", "dell","apple","samsung","apple", "samsung","dell","samsung","dell", "dell", "dell") revenue = c(20, 15, 10,25,20,20, 17,9,14,12, 9, 11) df = data.frame(sub, month, brand, tech, revenue)

I want to use sub and month as a key and get one line for each subscriber per month, which displays the amount of income for unique values in the technology and the brand for this subscriber for this month. This example is simplified and with fewer columns, since I have a huge data set, I decided to try to do this using data.table .

I managed to do this for a single column column, be it a technique or a brand, using this:

 df1 <- dcast(df, sub + month ~ tech, fun=sum, value.var = "revenue")

but I want to do this for two or more caqtogorical columns while I tried this:

 df2 <- dcast(df, sub + month ~ tech+brand, fun=sum, value.var = "revenue")

and it just concatenates the unique values of both the column columns and the sum for that, but I don't want that. I separate separate columns for each unique value of all columns.

I am new to R and would really appreciate any help.

+5

r data.table

Ali zia Nov 08 '16 at 8:45

source share

1 answer

David Arenburg · Accepted Answer · 2016-11-08T09:17:04+0000

(I will assume that df is data.table rather a data.frame , as in your example.)

One of the possible solutions for this is to first melt data, storing sub , month and revenue as keys. Thus, brand and tech will be converted into one variable with a value corresponding to each existing key combination. Thus, we can easily dcast return it, since we will work against a single column, as in the first example

 dcast(melt(df, c(1:2, 5)), sub + month ~ value, sum, value.var = "revenue") # sub month PC apple dell mobile samsung tablet # 1: X001 201506 10 20 10 20 0 0 # 2: X001 201508 17 20 0 0 17 20 # 3: X002 201507 0 0 0 20 35 15 # 4: X002 201508 14 0 11 0 14 11 # 5: X003 201507 0 25 9 25 0 9 # 6: X003 201508 12 0 21 9 0 0

According to OPs comment, you can easily add a prefix by adding also the variable formula to the formula. That way the column will also be ordered correctly

 dcast(melt(df, c(1:2, 5)), sub + month ~ variable + value, sum, value.var = "revenue") # sub month brand_apple brand_dell brand_samsung tech_PC tech_mobile tech_tablet # 1: X001 201506 20 10 0 10 20 0 # 2: X001 201508 20 0 17 17 0 20 # 3: X002 201507 0 0 35 0 20 15 # 4: X002 201508 0 11 14 14 0 11 # 5: X003 201507 25 9 0 0 25 9 # 6: X003 201508 0 21 0 12 9 0

Expanding the data frame for monthly income totals for all unique column values ​​in R columns

More articles:

Expanding the data frame for monthly income totals for all unique column values in R columns