The unexpected difference between loc and ix

Question

The unexpected difference between loc and ix

I noticed a strange difference between loc and ix on a subset of the DataFrame in Pandas.

 import pandas as pd # Create a dataframe df = pd.DataFrame({'id':[10,9,5,6,8], 'x1':[10.0,12.3,13.4,11.9,7.6], 'x2':['a','a','b','c','c']}) df.set_index('id', inplace=True) df x1 x2 id 10 10.0 a 9 12.3 a 5 13.4 b 6 11.9 c 8 7.6 c df.loc[[10, 9, 7]] # 7 does not exist in the index so a NaN row is returned df.loc[[7]] # KeyError: 'None of [[7]] are in the [index]' df.ix[[7]] # 7 does not exist in the index so a NaN row is returned

Why df.loc[[7]] df.ix[[7]] error, and df.ix[[7]] returns a string with NaN? This is mistake? If not, why are loc and ix designed this way?

(Note: I am using Pandas 0.17.1 on Python 3.5.1)

+5

python pandas

Ben Dec 14 '15 at 4:10

source share

2 answers

I think this behavior is intended, not a mistake.
Although I could not find the official documentation, I found a comment from jreback dated March 21, 2014 to GitHub pointing this out.

ix can very subtly give incorrect results (use an index with even numbers)
You can use any desired function; ix still exists, but it does not provide the guarantees that loc provides, namely that it will not interpret the number as a location

As to why it is designed so
As mentioned in docs

.ix supports mixed access based on integers and labels. This is primarily a label, but will return to integer positional access if the corresponding axis is not an integer type.

In my opinion, raising a KeyError would be ambiguous, since it came from an index or an integer position. Instead, ix returns NaN when specifying a list

+1

shanmuga Dec 14 '15 at 5:17

source share

joris · Accepted Answer · 2015-12-15T08:32:22+0000

As @shanmuga says, this (at least for loc ) is an assumed and documented behavior, not an error .

The documentation on loc / tagging gives rules for this ( http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label ):

At least 1 shortcut that you specify must be in the index or KeyError will be raised!

This means that using loc with one label (for example, df.loc[[7]] ) will result in an error if this label is not specified in the index, but when using it with a list of labels (for example, df.loc[[7,8,9]] ) the error will not increase if at least one of these marks is indicated in the index.

For ix I'm less sure, and this is not clearly documented, I think. But in any case, ix is much more permissive and has many edge cases (return to integer position, etc.) and rather a rabbit hole. But in the general case, ix will always return a result indexed with the provided labels (therefore, it does not check if the labels are in the index as loc ) if it does not return to indexing the integer position.
In most cases, it is recommended to use loc / iloc

The unexpected difference between loc and ix

More articles: