I worked through Wes McKinney with a new version of Python for data analysis, and on page 228 in chapter 8 he notes that pandas' data selection efficiency is āmuch betterā for objects with hierarchical indexing (for example, data) if the index is lexicographically sorted starting from the most external level.
In other words, selecting data on this data frame:
key1 key2 col1 1 a 11 b 12 2 a 13 b 14
... "much better" than selecting data on this data frame:
key1 key2 col1 1 a 11 2 a 13 1 b 12 2 b 14
Wes gives no explanation for this statement.
Please someone explain to me:
Why is the data selection on the first data frame āmuch betterā than on the second data block? In other words, why is the selection of data dataframes with a hierarchical index āmuch betterā when the dataframe is lexicographically sorted, starting from the outermost level?
What does āmuch betterā mean in this context? Faster? Is more memory efficient? Something else?
source share