Why is the performance of data selection ā€œmuch betterā€ on lexicographically sorted data frames?

I worked through Wes McKinney with a new version of Python for data analysis, and on page 228 in chapter 8 he notes that pandas' data selection efficiency is ā€œmuch betterā€ for objects with hierarchical indexing (for example, data) if the index is lexicographically sorted starting from the most external level.

In other words, selecting data on this data frame:

key1 key2 col1 1 a 11 b 12 2 a 13 b 14 

... "much better" than selecting data on this data frame:

 key1 key2 col1 1 a 11 2 a 13 1 b 12 2 b 14 

Wes gives no explanation for this statement.

Please someone explain to me:

  • Why is the data selection on the first data frame ā€œmuch betterā€ than on the second data block? In other words, why is the selection of data dataframes with a hierarchical index ā€œmuch betterā€ when the dataframe is lexicographically sorted, starting from the outermost level?

  • What does ā€œmuch betterā€ mean in this context? Faster? Is more memory efficient? Something else?

+5
source share

Source: https://habr.com/ru/post/1275604/


All Articles