Sklearn pairwise result is unexpectedly asymmetric

I calculate the Euclidean pairwise distance between the elements of a vector. I am using the pairwise_distances function from the sklearn package. However, the resulting matrix for some elements is only approximately symmetrical: in one example, the values ​​of the elements that should be equal are up to 15 digits behind the decimal point.

I realized this because I got errors in the analysis of the downward flow, which assumed the symmetry of the input matrices. I know that I can combine values, but what causes this ?!

Here is the vector I'm trying to compute for a pair (for the pandas column):

lag_measure_data[['bios_level']].values

array([[ 0.76881030949999995538490793478558771312236785888671875 ],
   [ 0.                                                      ],
   [ 0.67783090619999997183953155399649403989315032958984375 ],
   [ 0.3228176074999999922710003374959342181682586669921875  ],
   [ 0.75822395549999999087020796650904230773448944091796875 ],
   [ 0.469808621599999975959605080788605846464633941650390625],
   [ 0.989529862699999984698706612107343971729278564453125   ],
   [ 0.                                                      ],
   [ 0.5575436799999999859522858969285152852535247802734375  ],
   [ 0.9756440299999999954394525047973729670047760009765625  ],
   [ 0.66511863289999995085821637985645793378353118896484375 ],
   [ 0.978062709200000046649847718072123825550079345703125   ],
   [ 0.473957179800000016900440868994337506592273712158203125],
   [ 0.82409385540000001935112550199846737086772918701171875 ],
   [ 0.56548685279999999497846374651999212801456451416015625 ],
   [ 0.399505730399999980928527065771049819886684417724609375],
   [ 0.474232963900000026313819034839980304241180419921875   ],
   [ 0.34276307189999999369689476225175894796848297119140625 ],
   [ 0.9985316859999999739017084721126593649387359619140625  ],
   [ 0.9063241512999999915933813099400140345096588134765625  ],
   [ 0.                                                      ]])

Here is the command I use to get the distance matrix:

d_matrix_lag = pairwise_distances(lag_measure_data[['bios_level']].values)

, ,

0,445992701999999907602756366031826473772525787353515625

4-

0,4459927019999998520916051347739994525909423828125

+1
1

:

import numpy as np

a = np.array([[ 0.76881030949999995538490793478558771312236785888671875 ],
   [ 0.                                                      ],
   [ 0.67783090619999997183953155399649403989315032958984375 ],
   [ 0.3228176074999999922710003374959342181682586669921875  ],
   [ 0.75822395549999999087020796650904230773448944091796875 ],
   [ 0.469808621599999975959605080788605846464633941650390625],
   [ 0.989529862699999984698706612107343971729278564453125   ],
   [ 0.                                                      ],
   [ 0.5575436799999999859522858969285152852535247802734375  ],
   [ 0.9756440299999999954394525047973729670047760009765625  ],
   [ 0.66511863289999995085821637985645793378353118896484375 ],
   [ 0.978062709200000046649847718072123825550079345703125   ],
   [ 0.473957179800000016900440868994337506592273712158203125],
   [ 0.82409385540000001935112550199846737086772918701171875 ],
   [ 0.56548685279999999497846374651999212801456451416015625 ],
   [ 0.399505730399999980928527065771049819886684417724609375],
   [ 0.474232963900000026313819034839980304241180419921875   ],
   [ 0.34276307189999999369689476225175894796848297119140625 ],
   [ 0.9985316859999999739017084721126593649387359619140625  ],
   [ 0.9063241512999999915933813099400140345096588134765625  ],
   [ 0.                                                      ]])

from sklearn.metrics.pairwise import pairwise_distances
dist_sklearn = pairwise_distances(a)
print((dist_sklearn.transpose() == dist_sklearn).all())

False . scipy.spatial.distance. , ()

from scipy.spatial.distance import pdist, squareform

dist = pdist(a)
sq = squareform(dist)
print((sq.transpose() == sq).all())

. ,

+3

Source: https://habr.com/ru/post/1673255/


All Articles