Dataset example (rows were randomly extracted from a much larger matrix)
import numpy as np test = [[np.nan, np.nan, 0.217, 0.562], [np.nan, np.nan, 0.217, 0.562], [0.269, 0.0, 0.217, 0.562], [np.nan, np.nan, 0.217, -0.953], [np.nan, np.nan, 0.217, -0.788], [0.75, 0.0, 0.217, 0.326], [0.207, 0.0, 0.217, 0.814], [np.nan, np.nan, 0.217, 0.562], [np.nan, np.nan, 0.217, -0.022], [np.nan, np.nan, 0.217, 0.562], [np.nan, np.nan, 0.217, -0.953], [np.nan, np.nan, 0.217, -0.953], [0.078, 0.0, 0.217, -0.953], [np.nan, np.nan, 0.217, -0.953], [0.078, 0.0, 0.217, 0.562]] maskedarr = np.ma.array(test) np.ma.cov(maskedarr,rowvar=False,allow_masked=True) [[-- -- -- --] [-- -- -- --] [-- -- 0.0 0.0] [-- -- 0.0 0.554]]
However, if I use R,
import rpy2.robjects as robjects robjects.globalenv['maskedarr'] = robjects.FloatVector(maskedarr.T.flatten()) robjects.r(''' dim(maskedarr) <- c(%d,%d) maskedarr[] <- replace(maskedarr,!is.finite(maskedarr),NA) ''' % maskedarr.shape) robjects.r(''' print(cov(maskedarr,use="pairwise")) ''') [,1] [,2] [,3] [,4] [1,] 0.0769733 0 0 0.0428294 [2,] 0.0000000 0 0 0.0000000 [3,] 0.0000000 0 0 0.0000000 [4,] 0.0428294 0 0 0.5536484
I get a completely different matrix. If pair correlations are taken from nan
removed only for a pair, then I expect something like R answer - numpy.ma.cov
says allow_masked=True
will allow these pair correlations to be calculated, but it doesn't seem like that. Did I miss something?