Can I suggest the first function (ggNAadd) designed for this, and improve it with a second function that provides a graphical distribution of the generated NA (ggNA)
What is neat is the ability to enter either a fraction of a fixed number of NS.
ggNAadd = function(data, amount, plot=F){ temp <- data amount2 <- ifelse(amount<1, round(prod(dim(data))*amount), amount) if (amount2 >= prod(dim(data))) stop("exceeded data size") for (i in 1:amount2) temp[sample.int(nrow(temp), 1), sample.int(ncol(temp), 1)] <- NA if (plot) print(ggNA(temp)) return(temp) }
And the plotting function:
ggNA = function(data, alpha=0.5){ require(ggplot2) DF <- data if (!is.matrix(data)) DF <- as.matrix(DF) to.plot <- cbind.data.frame('y'=rep(1:nrow(DF), each=ncol(DF)), 'x'=as.logical(t(is.na(DF)))*rep(1:ncol(DF), nrow(DF))) size <- 20 / log( prod(dim(DF)) )
What gives (using ggplot2 as graphical output):
ggNAadd(df, amount=0.20, plot=TRUE) ## [1] "percentage of NA data: 20" ## AB c ## 1 1 11 21 ## 2 2 12 22 ## 3 3 13 23 ## 4 4 NA 24 ## ..

Of course, as mentioned earlier, if you ask too many NSs, the actual percentage will fall due to repetitions.
source share