What is the most idiomatic way to index an object using a logical array in pandas?

I especially talk about Pandas version 0.11, as I am busy replacing my use of .ix with .loc or .iloc. I like the fact that the differentiation between .loc and .iloc tells me if I intend to index by label or integer position. I see that any of them will also accept a logical array, but I would like their use to be clean in order to clearly convey my intentions.

+4
source share
2 answers

In 11.0, all three methods work, the method suggested in the documents is to simply use df[mask] . However, this is done not by position, but exclusively using labels, therefore, in my opinion, loc best describes what actually happens.

Update: I asked github about this, the conclusion: df.iloc[msk] will give NotImplementedError (if the integer index is mask) or ValueError (if not the integer index) in pandas 11.1 .

 In [1]: df = pd.DataFrame(range(5), list('ABCDE'), columns=['a']) In [2]: mask = (df.a%2 == 0) In [3]: mask Out[3]: A True B False C True D False E True Name: a, dtype: bool In [4]: df[mask] Out[4]: a A 0 C 2 E 4 In [5]: df.loc[mask] Out[5]: a A 0 C 2 E 4 In [6]: df.iloc[mask] # Due to this question, this will give a ValueError (in 11.1) Out[6]: a A 0 C 2 E 4 

It may be worth noting that if you specified the integer index in oil, it would throw an error:

 mask.index = range(5) df.iloc[mask] # or any of the others IndexingError: Unalignable boolean Series key provided 

This demonstrates that iloc is not actually implemented, it uses a label, so 11.1 will throw a NotImplementedError when we try this.

+3
source

I am currently using [] , i.e. __getitem__() , for example.

 df = pd.DataFrame(dict(a=range(5))) df[df.a%2==0] 
0
source

Source: https://habr.com/ru/post/1481328/


All Articles