Calculate Euclidean distance between rows of two pandas data

I have two pandas d1and dataframes d2that look like this:

d1 as follows:

  output   value1   value2   value2
    1           100     103      87
    1           201     97.5     88.9
    1           144     54       85

d2 as follows:

 output   value1   value2   value2
    0           100     103      87
    0           201     97.5     88.9
    0           144     54       85
    0           100     103      87
    0           201     97.5     88.9
    0           144     54       85

The column output has a value of 1 for all rows in d1 and 0 for all rows in d2. This is a grouping variable. I need to find the Euclidean distance between lines d1 and d2 (not within d1 or d2). If it d1has mrows and d2has nrows, then the distance matrix will have mrows and n columns

+4
source share
1 answer

Using scipy.spatial.distance.cdist:

import scipy

ary = scipy.spatial.distance.cdist(d1.iloc[:,1:], d2.iloc[:,1:], metric='euclidean')

pd.DataFrame(ary)
Out[1274]: 
            0           1          2           3           4          5
0    0.000000  101.167485  65.886266    0.000000  101.167485  65.886266
1  101.167485    0.000000  71.808495  101.167485    0.000000  71.808495
2   65.886266   71.808495   0.000000   65.886266   71.808495   0.000000
+6
source

Source: https://habr.com/ru/post/1690642/


All Articles