I have a data set with 498 variables of various kinds, numeric, logical, dates, and others, and I have it as a data frame in R with rows for observations and columns for variables. There is a certain subset of these variables for which I would like to replace their missing values ββwith the average value for this variable.
I encoded this very simple function for the average imputation:
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
And this works great if I apply to a separate variable say dataset $ variableA:
dataset$variableA <- impute.mean(dataset$variableA)
And this gives me exactly what I want for the one variable, but since I have a fairly large subset of variables, for which I need to do this, I would not want to do this manually by going through each variable that needs to be imputed.
My first instinct was to use one of the applicable functions in R to do this efficiently, however I don't seem to understand how to do this.
First, a rude attempt was made to use the standard:
newdataset <- apply(dataset, 2, impute.mean)
This is obviously a bit rude, as he is trying to apply this function to all columns, including variables that are not numeric, however this seemed like a reasonable starting place, even if it could generate a few warnings, Alas, this method did not work, and all mine the variables remain the same.
I also experimented a bit with lapply, mapply, ddply, but without any success.
Ideally, I would like to do something like this:
relevantVariables <- c("variableA1", "variableA2", ..., "variableA293") newdataset <- magical.apply(dataset, relevantVariables, impute.mean)
Is there any application function that works this way?
Alternatively, is there another effective way around this?