How to use sparse matrix in python hcluster?

I am trying to use the hcluster library in python. I don't have enough python knowledge to use sparse matrix in hcluster. Please help me. So what I do:

import os.path
import numpy
import scipy
import scipy.io 
from hcluster import squareform, pdist, linkage, complete 
from hcluster.hierarchy import linkage, from_mlab_linkage 
from numpy import savetxt 
from StringIO import StringIO 

data.dmp contains the matrix:

  A B C D
A 0 1 0 1 
B 1 0 0 1 
C 0 0 0 0 
D 1 1 0 0 

and contains only the upper right part of the matrix. I don’t know how to spell it correctly in English :) so all numbers are higher than the main diagonal so data.dmp contains: 1 0 1, 0 1, 0

f = file('data.dmp','r')  
s = StringIO(f.readline()).getvalue()
f.close()

matrix = numpy.asarray(eval("["+s+"]"))

for an unknown reason to me, hcluster uses inverted values, for example, I use 0 if A! = C, and use 1 if A == D

sqfrm = squareform(matrix)
Y = pdist(sqfrm, metric="cosine")

bond Y

Z = linkage(Y, method="complete")

So, the Z matrix is ​​what I need (if I used hcluster correctly?)

But I have the following problems:

  • , , python , thats . , python , ?

  • , python hcluster, , , hcluster? HAC?

, !

+3
1

, . .

Y , hcluster.pdist. . IF, l2- .

def sqrerr(repr1, repr2):
    """
    Compute the sqrerr between two reprs.
    The reprs are each a dict from feature to feature value.
    """
    keys = frozenset(repr1.keys() + repr2.keys())
    sqrerr = 0.
    for k in keys:
        diff = repr1.get(k, 0.) - repr2.get(k, 0.)
        sqrerr += diff * diff
    return sqrerr

sqrerr Y [i, j], .

Y , Y [i, j] == Y [j, i]. hcluster.squareform Y , hcluster.linkage.

+2

Source: https://habr.com/ru/post/1778618/


All Articles