Lexsort MultiIndex Key Error and Depth

Question

Lexsort MultiIndex Key Error and Depth

I have a set of tab delimited files that I have to read, read them, use them as a pandas dataframe, do a bunch of operations on them, and then merge them back into a single excel file, the code is also long, so I'm going to go through the problematic part of it

The tab files that I process contain all the same number of lines 2177

When I read these files, I index the first 2 columns of type (string, int)

df = df.set_index(['id', 'coord']) data = OrderedDict() #data will contain all the information I am writing to excel data[filename_id] = df

one of the procedures that I do requires access to each row of data [sample_id], which contains mixed-type dataframes indexed with column identifiers '' and 'coord', for example

 sample_row = data[sample].ix[index]

my index ('id', 'coord')

If I process a subset of the file, everything works fine, but if I read all the files with 2177 lines, I get an error

 KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (0)'

I searched for SO everywhere and it seems like it is a problem to sort the index, but I don't understand why using an unsorted subset does not cause a problem.

Any idea on how I can figure this out?

thanks

+6

python pandas

Rad Jul 23 '14 at 23:43

source share

1 answer

Jeff · Accepted Answer · 2014-07-24T11:44:36+0000

The documents are not bad. If you work with multi-indices, then it pays to read them (several times!), See here

 In [9]: df = DataFrame(np.arange(9).reshape(-1,1),columns=['value'],index=pd.MultiIndex.from_product([[1,2,3],['a','b','c']],names=['one','two'])) In [10]: df Out[10]: value one two 1 a 0 b 1 c 2 2 a 3 b 4 c 5 3 a 6 b 7 c 8 In [11]: df.index.lexsort_depth Out[11]: 2 In [12]: df.sortlevel(level=1) Out[12]: value one two 1 a 0 2 a 3 3 a 6 1 b 1 2 b 4 3 b 7 1 c 2 2 c 5 3 c 8 In [13]: df.sortlevel(level=1).index.lexsort_depth Out[13]: 0 In [9]: df = DataFrame(np.arange(9).reshape(-1,1),columns=['value'],index=pd.MultiIndex.from_product([[1,2,3],['a','b','c']],names=['one','two'])) In [10]: df Out[10]: value one two 1 a 0 b 1 c 2 2 a 3 b 4 c 5 3 a 6 b 7 c 8 In [11]: df.index.lexsort_depth Out[11]: 2 In [12]: df.sortlevel(level=1) Out[12]: value one two 1 a 0 2 a 3 3 a 6 1 b 1 2 b 4 3 b 7 1 c 2 2 c 5 3 c 8 In [13]: df.sortlevel(level=1).index.lexsort_depth Out[13]: 0

Update

sortlevel will be deprecated, so use sort_index ie

 df.sort_index(level=1)

Lexsort MultiIndex Key Error and Depth

More articles: