Lexsort MultiIndex Key Error and Depth

I have a set of tab delimited files that I have to read, read them, use them as a pandas dataframe, do a bunch of operations on them, and then merge them back into a single excel file, the code is also long, so I'm going to go through the problematic part of it

The tab files that I process contain all the same number of lines 2177

When I read these files, I index the first 2 columns of type (string, int)

df = df.set_index(['id', 'coord']) data = OrderedDict() #data will contain all the information I am writing to excel data[filename_id] = df 

one of the procedures that I do requires access to each row of data [sample_id], which contains mixed-type dataframes indexed with column identifiers '' and 'coord', for example

 sample_row = data[sample].ix[index] 

my index ('id', 'coord')

If I process a subset of the file, everything works fine, but if I read all the files with 2177 lines, I get an error

 KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (0)' 

I searched for SO everywhere and it seems like it is a problem to sort the index, but I don't understand why using an unsorted subset does not cause a problem.

Any idea on how I can figure this out?

thanks

+6
source share
1 answer

The documents are not bad. If you work with multi-indices, then it pays to read them (several times!), See here

 In [9]: df = DataFrame(np.arange(9).reshape(-1,1),columns=['value'],index=pd.MultiIndex.from_product([[1,2,3],['a','b','c']],names=['one','two'])) In [10]: df Out[10]: value one two 1 a 0 b 1 c 2 2 a 3 b 4 c 5 3 a 6 b 7 c 8 In [11]: df.index.lexsort_depth Out[11]: 2 In [12]: df.sortlevel(level=1) Out[12]: value one two 1 a 0 2 a 3 3 a 6 1 b 1 2 b 4 3 b 7 1 c 2 2 c 5 3 c 8 In [13]: df.sortlevel(level=1).index.lexsort_depth Out[13]: 0 In [9]: df = DataFrame(np.arange(9).reshape(-1,1),columns=['value'],index=pd.MultiIndex.from_product([[1,2,3],['a','b','c']],names=['one','two'])) In [10]: df Out[10]: value one two 1 a 0 b 1 c 2 2 a 3 b 4 c 5 3 a 6 b 7 c 8 In [11]: df.index.lexsort_depth Out[11]: 2 In [12]: df.sortlevel(level=1) Out[12]: value one two 1 a 0 2 a 3 3 a 6 1 b 1 2 b 4 3 b 7 1 c 2 2 c 5 3 c 8 In [13]: df.sortlevel(level=1).index.lexsort_depth Out[13]: 0 

Update

sortlevel will be deprecated, so use sort_index ie

 df.sort_index(level=1) 
+5
source

Source: https://habr.com/ru/post/972738/


All Articles