Conditional replacement of values in data.frame

Question

Conditional replacement of values in data.frame

I am trying to figure out how to conditionally replace values in a dataframe without using a loop. My data frame is structured as follows:

> df ab est 1 11.77000 2 0 2 10.90000 3 0 3 10.32000 2 0 4 10.96000 0 0 5 9.90600 0 0 6 10.70000 0 0 7 11.43000 1 0 8 11.41000 2 0 9 10.48512 4 0 10 11.19000 0 0

and dput output:

 structure(list(a = c(11.77, 10.9, 10.32, 10.96, 9.906, 10.7, 11.43, 11.41, 10.48512, 11.19), b = c(2, 3, 2, 0, 0, 0, 1, 2, 4, 0), est = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("a", "b", "est"), row.names = c(NA, -10L), class = "data.frame")

What I want to do is check the value of b . If b is 0, I want to set the est value from a . I understand that df$est[df$b == 0] <- 23 set all est values to 23 when b==0 . I do not understand how to set est to a when this condition is true. For example:

 df$est[df$b == 0] <- (df$a - 5)/2.533

gives the following warning:

 Warning message: In df$est[df$b == 0] <- (df$a - 5)/2.533 : number of items to replace is not a multiple of replacement length

Is there a way I can pass the appropriate cell, not a vector?

+46

r dataframe

djq Nov 21 '11 at 15:38

source share

4 answers

Try data.table := operator:

 DT = as.data.table(df) DT[b==0, est := (a-5)/2.533]

He is fast and short. See these related questions for more information on := :

Why does data.table matter :=

When to use the := operator in data.table

How to remove columns from data.frame

Do-it-yourself R

+23

Matt Dowle Nov 21 '11 at 16:14

source share

Here is one approach. ifelse vectorized and it checks all rows for zero values of b and replaces est with (a - 5)/2.53 if so.

 df <- transform(df, est = ifelse(b == 0, (a - 5)/2.53, est))

+10

Ramnath Nov 21 '11 at 15:41

source share

R-inferno , or basic R-documentation, will explain why using df $ * is not the best approach here. On the help page for "[":

"Indexing on [is like atomic vectors and selects a list of specified elements). And [[and $ select one element of the list. The main difference is that $ does not allow indexes to be calculated, while [[does. X $ name is equivalent to x [ ["name", exact = FALSE]]. In addition, the behavior of the pair match [[can be controlled with the exact argument. "

Instead, it is recommended to use the [row,col] notation. Example:

 Rgames: foo xyz [1,] 1e+00 1 0 [2,] 2e+00 2 0 [3,] 3e+00 1 0 [4,] 4e+00 2 0 [5,] 5e+00 1 0 [6,] 6e+00 2 0 [7,] 7e+00 1 0 [8,] 8e+00 2 0 [9,] 9e+00 1 0 [10,] 1e+01 2 0 Rgames: foo<-as.data.frame(foo) Rgames: foo[foo$y==2,3]<-foo[foo$y==2,1] Rgames: foo xyz 1 1e+00 1 0e+00 2 2e+00 2 2e+00 3 3e+00 1 0e+00 4 4e+00 2 4e+00 5 5e+00 1 0e+00 6 6e+00 2 6e+00 7 7e+00 1 0e+00 8 8e+00 2 8e+00 9 9e+00 1 0e+00 10 1e+01 2 1e+01

+5

Carl Witthoft Nov 21 '11 at 15:53

source share

Andrie · Accepted Answer · 2011-11-21 15:45

Since you are conditionally indexing df$est , you also need to conditionally index the replacement vector df$a :

 index <- df$b == 0 df$est[index] <- (df$a[index] - 5)/2.533

Of course, the index variable is temporary, and I use it to make the code more understandable. You can write this in one step:

 df$est[df$b == 0] <- (df$a[df$b == 0] - 5)/2.533

For even greater readability, you can use within :

 df <- within(df, est[b==0] <- (a[b==0]-5)/2.533)

Results, regardless of the method you choose:

 df ab est 1 11.77000 2 0.000000 2 10.90000 3 0.000000 3 10.32000 2 0.000000 4 10.96000 0 2.352941 5 9.90600 0 1.936834 6 10.70000 0 2.250296 7 11.43000 1 0.000000 8 11.41000 2 0.000000 9 10.48512 4 0.000000 10 11.19000 0 2.443743

As others have pointed out, an alternative solution in your example is to use ifelse .

Conditional replacement of values ​​in data.frame

More articles:

Conditional replacement of values in data.frame