Mute columns after a subset by value

Question

Mute columns after a subset by value

I have a large data framework and you want to standardize several columns, provided that the average value and standard deviation of the values. Let's say I have the following example data:

set.seed(123) df = data.frame("sample" = c(rep(1:2, each = 5)), "status" = c(0,1), "s1" = runif(10, -1, 1), "s2" = runif(10, -5, 5), "s3" = runif(10, -25, 25))

and you want to standardize each s1-s3, provided that the mean and standard deviation have the status == 0. If I did this, say, s1, I could do the following:

 df = df %>% group_by(sample) %>% mutate(sd_s1 = (s1 - mean(s1[status==0])) / sd(s1[status==0]))

But my problem arises when I have to perform this operation in multiple columns. I tried to write a function to enable with mutate_at:

 standardize <- function(x) { return((x - mean(x[status==0]))/sd(x[status==0])) } df = df %>% group_by(sample) %>% mutate_at(vars(s1:s3), standardize)

What exactly creates Na values for s1-s3. I tried to use the answer specified in: R - dplyr - mutate - use the names of dynamic variables , but I can’t figure out how to execute the subset.

Any help is appreciated. Thanks!

+5

r dplyr

J. Debost Sep 06 '17 at 14:05

source share

1 answer

akrun · Accepted Answer · 2017-09-06T14:12:45+0000

We could just use

 df %>% group_by(sample) %>% mutate_at(vars(s1:s3), funs((.- mean(.[status == 0]))/sd(.[status == 0])))

Mute columns after a subset by value

More articles: