Python pandas.core.indexing.IndexingError: Optional key of the boolean series.

So, I read the data table with 29 columns and added to one index column (30 in total).

Data = pd.read_excel(os.path.join(BaseDir, 'test.xlsx')) Data.reset_index(inplace=True) 

and then I wanted to multiply the data to include only columns whose column name contains "ref" or "Ref"; I got the code under another Stack post:

 col_keep = Data.ix[:, pd.Series(Data.columns.values).str.contains('ref', case=False)] 

However, I keep getting this error:

  print(len(Data.columns.values)) 30 print(pd.Series(Data.columns.values).str.contains('ref', case=False)) 0 False 1 False 2 False 3 False 4 False 5 False 6 False 7 False 8 False 9 False 10 False 11 False 12 False 13 False 14 False 15 False 16 False 17 False 18 False 19 False 20 False 21 False 22 False 23 False 24 True 25 True 26 True 27 True 28 False 29 False dtype: bool Traceback (most recent call last): File "C:/Users/lala.py", line 26, in <module> col_keep = FedexData.ix[:, pd.Series(FedexData.columns.values).str.contains('ref', case=False)] File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 84, in __getitem__ return self._getitem_tuple(key) File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 816, in _getitem_tuple retval = getattr(retval, self.name)._getitem_axis(key, axis=i) File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1014, in _getitem_axis return self._getitem_iterable(key, axis=axis) File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1041, in _getitem_iterable key = check_bool_indexer(labels, key) File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1817, in check_bool_indexer raise IndexingError('Unalignable boolean Series key provided') pandas.core.indexing.IndexingError: Unalignable boolean Series key provided 

So, the booleans are correct, but why doesn't this work? why does the error keep popping up?

Any help / hint appreciated! Thank you very much.

0
source share
1 answer

I can reproduce a similar error message as follows:

 import numpy as np import pandas as pd df = pd.DataFrame(np.random.randint(4, size=(10,4)), columns=list('ABCD')) df.ix[:, pd.Series([True,False,True,False])] 

boosts (using Pandas version 0.21.0.dev + 25.g50e95e0)

 pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match 

The problem arises because Pandas tries to align the series index with the index of the DataFrame column before disguising it using boolean series values. Since df has column labels 'A', 'B', 'C', 'D' , and the row has index labels 0 , 1 , 2 , 3 , Pandas complain that the labels are unalignable.

You probably don't want index alignment. Instead, pass a NumPy Boolean array instead of the Pandas Series:

 mask = pd.Series(Data.columns.values).str.contains('ref', case=False).values col_keep = Data.loc[:, mask] 

The Series.values attribute returns a NumPy array. And since in future versions of Pandas, DataFrame.ix will be removed , use Data.loc instead of Data.ix here, since we want boolean indexing.

+1
source

Source: https://habr.com/ru/post/988817/


All Articles