Numpy.ma.cov - pair correlations with missing values?

Dataset example (rows were randomly extracted from a much larger matrix)

import numpy as np test = [[np.nan, np.nan, 0.217, 0.562], [np.nan, np.nan, 0.217, 0.562], [0.269, 0.0, 0.217, 0.562], [np.nan, np.nan, 0.217, -0.953], [np.nan, np.nan, 0.217, -0.788], [0.75, 0.0, 0.217, 0.326], [0.207, 0.0, 0.217, 0.814], [np.nan, np.nan, 0.217, 0.562], [np.nan, np.nan, 0.217, -0.022], [np.nan, np.nan, 0.217, 0.562], [np.nan, np.nan, 0.217, -0.953], [np.nan, np.nan, 0.217, -0.953], [0.078, 0.0, 0.217, -0.953], [np.nan, np.nan, 0.217, -0.953], [0.078, 0.0, 0.217, 0.562]] maskedarr = np.ma.array(test) np.ma.cov(maskedarr,rowvar=False,allow_masked=True) [[-- -- -- --] [-- -- -- --] [-- -- 0.0 0.0] [-- -- 0.0 0.554]] 

However, if I use R,

 import rpy2.robjects as robjects robjects.globalenv['maskedarr'] = robjects.FloatVector(maskedarr.T.flatten()) robjects.r(''' dim(maskedarr) <- c(%d,%d) maskedarr[] <- replace(maskedarr,!is.finite(maskedarr),NA) ''' % maskedarr.shape) robjects.r(''' print(cov(maskedarr,use="pairwise")) ''') [,1] [,2] [,3] [,4] [1,] 0.0769733 0 0 0.0428294 [2,] 0.0000000 0 0 0.0000000 [3,] 0.0000000 0 0 0.0000000 [4,] 0.0428294 0 0 0.5536484 

I get a completely different matrix. If pair correlations are taken from nan removed only for a pair, then I expect something like R answer - numpy.ma.cov says allow_masked=True will allow these pair correlations to be calculated, but it doesn't seem like that. Did I miss something?

+4
source share
1 answer

There maskedarr no mask values ​​in your maskedarr .

 >>> maskedarr.mask False 

When initializing the array, you must include the mask argument.

 >>> maskedarr = np.ma.array(test, mask=np.isnan(test)) 

Now maskedarr.mask looks like this.

 >>> maskedarr.mask array([[ True, True, False, False], [ True, True, False, False], [False, False, False, False], [ True, True, False, False], [ True, True, False, False], [False, False, False, False], [False, False, False, False], [ True, True, False, False], [ True, True, False, False], [ True, True, False, False], [ True, True, False, False], [ True, True, False, False], [False, False, False, False], [ True, True, False, False], [False, False, False, False]], dtype=bool) 

This time when executing numpy.ma.cov :

 >>> np.ma.cov(maskedarr,rowvar=False,allow_masked=True) masked_array(data = [[0.0769732996251 0.0 0.0 0.0428294015418] [0.0 0.0 0.0 0.0] [0.0 0.0 0.0 0.0] [0.0428294015418 0.0 0.0 0.553648402899]], mask = [[False False False False] [False False False False] [False False False False] [False False False False]], fill_value = 1e+20) 
+6
source

Source: https://habr.com/ru/post/1383320/


All Articles