How to identify meaningful elements from a correlation matrix in Python (without an inner loop)

Question

How to identify meaningful elements from a correlation matrix in Python (without an inner loop)

I built a correlation matrix derived from a small set of tests, and ended up with the following. True values are values that exceed the specified value (for example, results = relation_matrix> 0.75)

[[False False False  True]
 [False False  True False]
 [False  True False  True]
 [ True False  True False]]

Note that I also faked the diagonal (top left, bottom right). I also need half the matrix, because it is a mirror top-left / bottom-right.

Is there a way / function in Numpy (or another) for me to return a row / column of True values? When I use this against real data (200 thousand rows), I need to do this quickly without using an internal loop. 200k * 200k checks will be very slow. I suppose there should be a matrix / numpy / scikit.learn function, etc. that will provide this, but I could not find it.

The expected result of this will be:

[[1, 4], [2, 3], [3, 2], [3, 4], [4, 1], [4, 3]]

Ideally, given that this is a mirror image, it would be:

[[1, 4], [2, 3], [3, 4]]

+4

python numpy matrix correlation

Jon m Aug 15 '17 at 17:35

source share

1 answer

Divakar · Accepted Answer · 2017-08-15T17:49:30+0000

0, np.triu, np.argwhere -

np.argwhere(np.triu(a))

, np.triu(a,1).

, broadcasting -

r = np.arange(a.shape[0])
a[r[:,None] >= r] = 0 # Note that this changes input array
indices = np.argwhere(a)

How to identify meaningful elements from a correlation matrix in Python (without an inner loop)

More articles: