Conditional calculations in a data frame

Question

Conditional calculations in a data frame

I often have to compute new variables from existing data in a frame based on the condition of a factor variable.

Change After receiving 4 responses in 2 minutes, I realized that I had simplified my example. See below.

A simple example:

df <- data.frame(value=c(1:5),class=letters[1:5]) df value class 1 a 2 b 3 c 4 d 5 e

I can use code like this

 df %>% mutate(result=NA) %>% mutate(result=ifelse(class=="a",value*1,result)) %>% mutate(result=ifelse(class=="b",value*2,result)) %>% mutate(result=ifelse(class=="c",value*3,result)) %>% mutate(result=ifelse(class=="d",value*4,result)) %>% mutate(result=ifelse(class=="e",value*5,result))

to perform conditional calculations on my variables, resulting in

 value class result 1 a 1 2 b 4 3 c 9 4 d 16 5 e 25

As in reality, the number of classes is greater, and the calculations are more complicated, however, I would prefer something cleaner, like this

 df %>% mutate(results=switch(levels(class), "a"=value*1, "b"=value*2, "c"=value*3, "d"=value*4, "e"=value*5))

which obviously doesn't work

 Error in switch(levels(1:5), a = 1:5 * 1, b = 1:5 * 2, c = 1:5 * 3, d = 1:5 * : EXPR must be a length 1 vector

Is there a way to make this more beautiful with dplyr piping (or more)?

Edit In fact, I have more variable values to include in my calculations, and they are not simple sequential vectors, they are thousands of lines of measured data.

Here is my simple example with a second random value variable (again, this is more in my real data)

 df <- data.frame(value1=c(1:5),value2=c(2.3,3.6,7.2,5.6,0),class=letters[1:5]) value1 value2 class 1 2.3 a 2 3.6 b 3 7.2 c 4 5.6 d 5 0.0 e

and my calculations are different for each condition. I understand that I can simplify this a bit.

 df %>% mutate(result=NA, result=ifelse(class=="a",value1*1,result), result=ifelse(class=="b",value1/value2*4,result), result=ifelse(class=="c",value2*3.57,result), result=ifelse(class=="d",value1+value2*2,result), result=ifelse(class=="e",value2/value1/5,result))

A working solution like the switch example above will be even cleaner.

+6

r dplyr

user3460194 Jun 17 '15 at 16:31

source share

4 answers

As I mentioned in the comments, this question more or less matches this one (and you should read the answer there to catch up with what's happening below):

 library(data.table) dt = as.data.table(df) # or setDT to convert in place dt[, class := as.character(class)] # simpler # create a data.table with *functions* to match each class fns = data.table(cls = letters[1:5], fn = list(quote(value1*1), quote(value1/value2*4), quote(value2*3.57), quote(value1+value2*2), quote(value2/value1/5)), key = 'cls') # I have to jump through hoops here, due to a bug or two, see below setkey(dt, class) newvals = dt[, eval(fns[class]$fn[[1]], .SD), by = class]$V1 dt[, result := newvals][] # value1 value2 class result #1: 1 2.3 a 1.000000 #2: 2 3.6 b 2.222222 #3: 3 7.2 c 25.704000 #4: 4 5.6 d 15.200000 #5: 5 0.0 e 0.000000

Due to several errors in data.table following simple versions do not work yet:

 dt[, result := eval(fns[class]$fn[[1]], .SD), by = class] # or even better dt[fns, result := eval(fn[[1]], .SD), by = .EACHI]

Error messages have been sent.

I add the sentence in the comments from Frank below, as I think this is pretty cool, and so it is more likely to be stored in SO. A more readable way to create a function table is as follows:

 quotem <- function(...) as.list(sys.call())[-1] fnslist <- quotem(a = value1*1, b = value1/value2*4, c = value2*3.57, d = value1+value2*2, e = value2/value1/5) fns = data.table(cls=names(fnslist),fn=fnslist,key="cls")

+3

eddi Jun 17 '15 at 18:16

source share

A similar idea using dplyr and @agstudy:

 library(dplyr) df %>% left_join(cond) %>% mutate(result = value * ratio)

What gives:

 # value class ratio result #1 1 a 1 1 #2 2 b 2 4 #3 3 c 3 9 #4 4 d 4 16 #5 5 e 5 25

+2

Steven beaupré Jun 17 '15 at 16:41

source share

Here's the dplyr / lazyeval implementation of dplyr lazyeval answer:

 # required packages require(lazyeval) require(dplyr) # data (from @agstudy) df <- data.frame(value1=c(1:5),value2=c(2.3,3.6,7.2,5.6,0), class=rep(letters[1:5],2)) # functions (lazy instead of functions) fns <- list( a = lazy(x*1), b = lazy(x/y*4), c = lazy(y*3.57), d = lazy(x+y*2), e = lazy(y/x/5) ) # mutate call df %>% group_by(class) %>% mutate(value = lazy_eval(fns[class][[1]], list(x = value1, y = value2)))

+2

shadow Jun 18 '15 at 11:38

source share

agstudy · Accepted Answer · 2015-06-17T16:34:37+0000

No need to use ifelse here, you can use merge :

 df <- data.frame(value=c(1:5),class=letters[1:5]) cond <- data.frame(ratio=c(1:5),class=letters[1:5]) transform(merge(df,cond),result=value*ratio) class value ratio result 1 a 1 1 1 2 b 2 2 4 3 c 3 3 9 4 d 4 4 16 5 e 5 5 25

After editing the OP

It seems like the OP wants to apply a different function to each class. Here is the data.table solution. I think this is simple and straightforward. First, I create a function for each factor:

 ## here each function takes a data.table as an single argument fns <- list( function(x) x[,value1]*1, function(x) x[,value1]/x[,value2]*4, function(x) x[,value2]*3.57, function(x) x[,value1]+x[,value2]*2, function(x) x[,value2]/x[,value1]/5 ) ## create a names list here ## the names here are just the class factors fns <- setNames(fns,letters[1:5])

Applying a function by class is simple. I create a function name and I use do.call to call a function by its name

 ## using data.table here for grouping feature ## .SD is the rest of columns except the grouping variable ## the code can also be written in dplyr or in base-R library(data.table) setDT(df)[,value:= fns[[class]](.SD),by=class] value1 value2 class value 1: 1 2.3 a 1.000000 2: 2 3.6 b 2.222222 3: 3 7.2 c 25.704000 4: 4 5.6 d 15.200000 5: 5 0.0 e 0.000000 6: 1 2.3 a 1.000000 7: 2 3.6 b 2.222222 8: 3 7.2 c 25.704000 9: 4 5.6 d 15.200000 10: 5 0.0 e 0.000000

I am using this df:

 df <- data.frame(value1=c(1:5),value2=c(2.3,3.6,7.2,5.6,0), class=rep(letters[1:5],2))

Conditional calculations in a data frame

After editing the OP

More articles: