I have a CSV file containing the distance between the centroids in the GIS model in the following format:
InputID,TargetID,Distance 1,2,3050.01327866 1,7,3334.99565217 1,5,3390.99115304 1,3,3613.77046864 1,4,4182.29900892 ... ... 3330,3322,955927.582933
It is sorted by start ( InputID ) and then by the nearest destination ( TargetID ).
For a specific modeling tool, I need this data in a CSV file formatted as follows (numbers are numbers in the center):
distance1->1, distance1->2, distance1->3,.....distance1->3330 distance2->1, distance2->2,..... ..... distance3330->1,distance3330->2....distance3330->3330
Thus, there are no InputID or TargetIDs, but only distances with roots in the rows and destinations in the columns: (example for the first 5 sources / destinations)
0,3050.01327866,3613.77046864,4182.29900892,3390.99115304 3050.01327866,0,1326.94611797,1175.10254872,1814.45584129 3613.77046864,1326.94611797,0,1832.209595,3132.78725738 4182.29900892,1175.10254872,1832.209595,0,1935.55056767 3390.99115304,1814.45584129,3132.78725738,1935.55056767,0
I built the following code and it works. But it is so slow that it will take several days to start to get the 3330x3330 file. Since I'm new to Python, I think I'm missing something ...
import pandas as pd import numpy as np file=pd.read_csv('c:\\users\\Niels\\Dropbox\\Python\\centroid_distances.csv') df=file.sort_index(by=['InputID', 'TargetID'], ascending=[True, True]) number_of_zones=3330 text_file = open("c:\\users\\Niels\\Dropbox\\Python\\Output.csv", "w") for origin in range(1,number_of_zones): output_string='' print(origin) for destination in range(1,number_of_zones): if origin==destination: distance=0 else: distance_row=df[(df['InputID']==origin) & (df['TargetID'] == destination)]
Could you give me some tips to speed up this code?