I'm not sure if this is possible, but if it is, it will make life much more efficient.
A common problem that will be of interest to the wider SO community: for loops (and basic functions such as apply) are applicable for general / consistent operations, such as adding X to each column or row of a data frame. I have a common / consistent operation that I want to perform, but with unique values for each element of the data frame.
Is there a way to do this more efficiently than a subset of my data frame for each grouping, applying a function with specific numbers relative to that grouping, and then recombining? I don't care if it's a for loop or apply, but bonus points if it uses plyr functionality.
Here is a more specific problem I'm working on: I have the data below. Ultimately, what I want is a data frame for a time series with a date, and each column represents the relation of the region to some reference.
Problem: The measure of interest for each region is different, and this is also a benchmark. Here is the data:
library(dplyr)
library(reshape2)
data <- data.frame(
region = sample(c("northeast","midwest","west"), 100, replace = TRUE),
date = rep(seq(as.Date("2010-02-01"), length=10, by = "1 day"),10),
population = sample(50000:100000, 10, replace = T),
skiers = sample(1:100),
bearsfans = sample(1:100),
dudes = sample(1:100)
)
and the composite frame I'm working on:
data2 <- data %.%
group_by(date, region) %.%
summarise(skiers = sum(skiers),
bearsfans= sum(bearsfans),
dudes = sum(dudes),
population = sum(population)) %.%
mutate(ppl_per_skier = population/skiers,
ppl_per_bearsfan = population/bearsfans,
ppl_per_dude = population/dudes) %.%
select(date, region, ppl_per_skier, ppl_per_bearsfan , ppl_per_dude)
Here's the tricky part:
- In the northeast, I only care about "ppl_per_skier", and the standard is 3500
- In the Midwest, I only care about ppl_per_bearsfan, and the benchmark is 1200
- "ppl_per_dude", 5000 -
, , , ... . :
midwest <- data2 %.%
filter(region == "midwest") %.%
select(date, region, ppl_per_bearsfan) %.%
mutate(bmark = 1200, against_bmk = bmark/ppl_per_bearsfan-1) %.%
select(date, against_bmk)
, , . , - , (, , ):
date midwest_againstbmk northeast_againstbmk west_againstbmk
1 2010-02-10 0.9617402 0.6008032 0.3403260
2 2010-02-11 0.5808621 0.5119942 0.7787559
3 2010-02-12 0.4828346 0.6560053 0.3747920
4 2010-02-13 0.6499841 0.7567194 0.8387461
5 2010-02-14 0.6367520 0.4564254 0.7269161
, X , ?