How to calculate percentages with one variable in the first data frame is an aggregated sum of different values in another data frame

Question

How to calculate percentages with one variable in the first data frame is an aggregated sum of different values in another data frame

I have a problem with two data frames when I try to calculate percentages. In the first data frame, I have cumulative sums for different operations that a person does. This is a data frame, the first is an initial data frame with information for each person:

z=data.frame(ID=c("0001","0002","0002","0001","0003","0003","0004","0004","0001","0003"),Amount=c(10,20,10,30,50,10,40,10,10,30),Place=c("KFC","Marcys","Ezone","Ezone","Italocafe","Italocafe","KFC","Walmart","KFC","KFC"))

After I aggregated this, I:

  ID Final.Amount 1 0001 50 2 0002 30 3 0003 90 4 0004 50

I want to calculate the percentage for each identifier associated with the Place variable, I tried with plyr, but I did not get the result. I am looking for someone like this:

  ID Final.Amount Perct.KFC Perct.Macys Perct.Ezonne Perct.Italocafe Percent.Walmart 1 0001 50 40% 0% 60% 0% 0% 2 0002 30 0% 67% 33% 0% 0% 3 0003 90 33% 0% 0% 67% 0% 4 0004 50 80% 0% 0% 0% 20%

I tried with plyr, but I do not get the correct structure, I do not know if I need sqldf or another package.

+4

r

Duck Mar 14 '13 at 15:56

source share

2 answers

The answer is rewritten here using data.table and base reshape . I have to resort to the formation of functions after calculating the percent.

 require(data.table) w <- data.table(z) w1 <- w[, list(val=sum(Amount)), by=list(ID, Place)][, list(Place=Place, percent=val/sum(val) * 100), by=ID] reshape(w1, idvar="ID", timevar="Place", direction="wide") # ID percent.KFC percent.Ezone percent.Marcys percent.Italocafe percent.Walmart # 1: 0001 40.00000 60.00000 NA NA NA # 2: 0002 NA 33.33333 66.66667 NA NA # 3: 0003 33.33333 NA NA 66.66667 NA # 4: 0004 80.00000 NA NA NA 20

+2

Arun Mar 14 '13 at 16:06

source share

juba · Accepted Answer · 2013-03-14T16:11:09+0000

Solution using reshape2 :

 library(reshape2) d <- acast(z, ID~Place, value.var="Amount", fun=sum) prop.table(d,1)*100

What gives:

  Ezone Italocafe KFC Marcys Walmart 0001 60.00000 0.00000 40.00000 0.00000 0 0002 33.33333 0.00000 0.00000 66.66667 0 0003 0.00000 66.66667 33.33333 0.00000 0 0004 0.00000 0.00000 80.00000 0.00000 20

How to calculate percentages with one variable in the first data frame is an aggregated sum of different values ​​in another data frame

More articles:

How to calculate percentages with one variable in the first data frame is an aggregated sum of different values in another data frame