Why is the performance of data selection “much better” on lexicographically sorted data frames?

Question

Why is the performance of data selection “much better” on lexicographically sorted data frames?

I worked through Wes McKinney with a new version of Python for data analysis, and on page 228 in chapter 8 he notes that pandas' data selection efficiency is “much better” for objects with hierarchical indexing (for example, data) if the index is lexicographically sorted starting from the most external level.

In other words, selecting data on this data frame:

key1 key2 col1 1 a 11 b 12 2 a 13 b 14

... "much better" than selecting data on this data frame:

 key1 key2 col1 1 a 11 2 a 13 1 b 12 2 b 14

Wes gives no explanation for this statement.

Please someone explain to me:

Why is the data selection on the first data frame “much better” than on the second data block? In other words, why is the selection of data dataframes with a hierarchical index “much better” when the dataframe is lexicographically sorted, starting from the outermost level?
What does “much better” mean in this context? Faster? Is more memory efficient? Something else?

+5

python pandas dataframe

Keith dowd Feb 27 '18 at 16:07

source share

No one has answered this question yet.

See similar questions:

31

What is the impact of custom indexes in pandas?

fifteen

What is the index point in pandas?

or similar:

1675