Replacing NA in each column of the matrix with the median of this column

I try to replace the NA in each column of the matrix with the median of that column, however, when I try to use lapplyor sapply, I get an error message; the code works when I use for-loop and when I change one column at a time, what am I doing wrong?

Example:

set.seed(1928)
mat <- matrix(rnorm(100*110), ncol = 110)
mat[sample(1:length(mat), 700, replace = FALSE)] <- NA
mat1 <- mat2 <- mat

mat1 <- lapply(mat1,
  function(n) {
     mat1[is.na(mat1[,n]),n] <- median(mat1[,n], na.rm = TRUE)
  }
)   

for (n in 1:ncol(mat2)) {
  mat2[is.na(mat2[,n]),n] <- median(mat2[,n], na.rm = TRUE)
}
+4
source share
4 answers

I suggest vectorizing this with a package matrixStatsinstead of computing the median for each column using any of the loops ( sapplyalso a loop in the sense that it evaluates the function at each iteration).

First we will create an index NA

indx <- which(is.na(mat), arr.ind = TRUE)

NA,

mat[indx] <- matrixStats::colMedians(mat, na.rm = TRUE)[indx[, 2]]
+7

sweep:

sweep(mat, MARGIN = 2, 
      STATS = apply(mat, 2, median, na.rm=TRUE),
      FUN =  function(x,s) ifelse(is.na(x), s, x)
    )

EDIT: STATS=matrixStats::colMedians(mat, na.rm=TRUE) .

+2

lapplyiterates over the list. Do you want to iterate over the columns?

matx <- sapply(seq_len(ncol(mat1)), function(n) {
  mat1[is.na(mat1[,n]),n] <- median(mat1[,n], na.rm = TRUE)
})

although this essentially just does what your loop example does (but supposedly faster).

+1
source

As a result, you could simplify by converting to data.frameand returning to matrixusing vapply:

vapply(as.data.frame(mat1), function(x)
   replace(x, is.na(x), median(x,na.rm=TRUE)), FUN.VALUE=numeric(nrow(mat1)) 
)
0
source

Source: https://habr.com/ru/post/1624873/


All Articles