Conditional Nearest Neighbor in Python

Question

Conditional Nearest Neighbor in Python

I am trying to do some closest neighbor analysis in Python using Pandas / Numpy / Scipy etc. and having tried several different approaches, Im stumped.

I have 2 data frames as follows:

df1

Lon1    Lat1    Type
10      10      A
50      50      A
20      20      B

df2

Lon2    Lat2    Type    Data-1  Data-2  
11      11      A       Eggs    Bacon       
51      51      A       Nuts    Bread   
61      61      A       Beef    Lamb    
21      21      B       Chips   Chicken
31      31      B       Sauce   Pasta
71      71      B       Rice    Oats
81      81      B       Beans   Peas

Im trying to identify the 2 nearest neighbors in df2 (based on Lon / Lat values using the Euclidean distance), and then combine the corresponding Data-1 and Data-2 values on df1 so that it looks like this:

Lon1    Lat1    Type    Data-1a     Data-2a     Data-1b     Data-2b
10      10      A       Eggs        Bacon       Nuts        Bread
50      50      A       Nuts        Bread       Beef        Lamb
20      20      B       Chips       Chicken     Sauce       Pasta

Ive tried both a long and a wide approach to form and tend to use the ckd tree from scipy, however is there any way to do this so that it only looks at strings with the appropriate type?

Thanks in advance.

** Change **

I made some progress as follows:

Typelist = df2['Type'].unique().tolist()
df_dict = {'{}'.format(x): df2[(df2['Type'] == x)] for x in Rlist}

def treefunc(row):
    if row['Type'] == 'A':     
        type = row['Type']
        location = row[['Lon1','Lat1']].values
        tree = cKDTree(df_dict[type][['Lon2','Lat2']].values)
        dists, indexes = tree.query(location, k=2)
        return dists,indexes

dftest = df1.apply(treefunc,axis=1)

2 , ! :

['Type'] Typelist, .isin, - ?
Pandas , kdtree?
Data-1 Data-2 ?

.

+4

python numpy scipy

Tom 28 . '15 14:47

1

ryanmc · Answer 1 · 2015-10-28T17:33:12+0000

, , . scikit, , ( ).

import pandas as pd
from io import StringIO

s1 = StringIO(u'''Lon2,Lat2,Type,Data-1,Data-2
11,11,A,Eggs,Bacon
51,51,A,Nuts,Bread
61,61,A,Beef,Lamb
21,21,B,Chips,Chicken
31,31,B,Sauce,Pasta
71,71,B,Rice,Oats
81,81,B,Beans,Peas''')

df2 = pd.read_csv(s1)

#Start here

from sklearn.neighbors import NearestNeighbors
import numpy as np

dfNN = pd.DataFrame()

idx = 0
for i in pd.unique(df2.Type):
    dftype = df2[df2['Type'] == i].reindex()
    X = dftype[['Lon2','Lat2']].values
    nbrs = NearestNeighbors(n_neighbors=2, algorithm='kd_tree').fit(X)
    distances, indices = nbrs.kneighbors(X)
    for j in range(len(indices)):
        dfNN = dfNN.append(dftype.iloc[[indices[j][0]]])
        dfNN.loc[idx, 'Data-1b'] = dftype.iloc[[indices[j][1]]]['Data-1'].values[0]
        dfNN.loc[idx, 'Data-2b'] = dftype.iloc[[indices[j][1]]]['Data-2'].values[0]
        dfNN.loc[idx, 'Distance'] = distances[j][1]
        idx += 1
    dfNN = dfNN[['Lat2', 'Lon2', 'Type', 'Data-1', 'Data-2','Data-1b','Data-2b','Distance']]

Conditional Nearest Neighbor in Python

More articles: