I have a set of tab delimited files that I have to read, read them, use them as a pandas dataframe, do a bunch of operations on them, and then merge them back into a single excel file, the code is also long, so I'm going to go through the problematic part of it
The tab files that I process contain all the same number of lines 2177
When I read these files, I index the first 2 columns of type (string, int)
df = df.set_index(['id', 'coord']) data = OrderedDict()
one of the procedures that I do requires access to each row of data [sample_id], which contains mixed-type dataframes indexed with column identifiers '' and 'coord', for example
sample_row = data[sample].ix[index]
my index ('id', 'coord')
If I process a subset of the file, everything works fine, but if I read all the files with 2177 lines, I get an error
KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (0)'
I searched for SO everywhere and it seems like it is a problem to sort the index, but I don't understand why using an unsorted subset does not cause a problem.
Any idea on how I can figure this out?
thanks
source share