The unexpected difference between loc and ix

I noticed a strange difference between loc and ix on a subset of the DataFrame in Pandas.

 import pandas as pd # Create a dataframe df = pd.DataFrame({'id':[10,9,5,6,8], 'x1':[10.0,12.3,13.4,11.9,7.6], 'x2':['a','a','b','c','c']}) df.set_index('id', inplace=True) df x1 x2 id 10 10.0 a 9 12.3 a 5 13.4 b 6 11.9 c 8 7.6 c df.loc[[10, 9, 7]] # 7 does not exist in the index so a NaN row is returned df.loc[[7]] # KeyError: 'None of [[7]] are in the [index]' df.ix[[7]] # 7 does not exist in the index so a NaN row is returned 

Why df.loc[[7]] df.ix[[7]] error, and df.ix[[7]] returns a string with NaN? This is mistake? If not, why are loc and ix designed this way?

(Note: I am using Pandas 0.17.1 on Python 3.5.1)

+5
source share
2 answers

As @shanmuga says, this (at least for loc ) is an assumed and documented behavior, not an error .

The documentation on loc / tagging gives rules for this ( http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label ):

At least 1 shortcut that you specify must be in the index or KeyError will be raised!

This means that using loc with one label (for example, df.loc[[7]] ) will result in an error if this label is not specified in the index, but when using it with a list of labels (for example, df.loc[[7,8,9]] ) the error will not increase if at least one of these marks is indicated in the index.


For ix I'm less sure, and this is not clearly documented, I think. But in any case, ix is much more permissive and has many edge cases (return to integer position, etc.) and rather a rabbit hole. But in the general case, ix will always return a result indexed with the provided labels (therefore, it does not check if the labels are in the index as loc ) if it does not return to indexing the integer position.
In most cases, it is recommended to use loc / iloc

+2
source

I think this behavior is intended, not a mistake.
Although I could not find the official documentation, I found a comment from jreback dated March 21, 2014 to GitHub pointing this out.

ix can very subtly give incorrect results (use an index with even numbers)

You can use any desired function; ix still exists, but it does not provide the guarantees that loc provides, namely that it will not interpret the number as a location


As to why it is designed so
As mentioned in docs

.ix supports mixed access based on integers and labels. This is primarily a label, but will return to integer positional access if the corresponding axis is not an integer type.

In my opinion, raising a KeyError would be ambiguous, since it came from an index or an integer position. Instead, ix returns NaN when specifying a list

+1
source

Source: https://habr.com/ru/post/1238056/


All Articles