I have a pandas DataFrame
import pandas as pd import numpy as np df = pd.DataFrame({ 'x': ['a', 'b', 'c'], 'y': [1, 2, 2], 'z': ['f', 's', 's'] }).set_index('x')
from which I would like to select rows based on index ( x ) values ββin the select array
selection = ['a', 'c', 'b', 'b', 'c', 'a']
The correct conclusion can be obtained using df.loc as follows
out = df.loc[selection]
The problem I'm working with is that df.loc runs quite slowly on large DataFrames (2-7 million rows). Is there any way to speed up this operation? I looked at eval() , but this does not seem to apply to hard-coded lists of index values ββlike this. I also thought about using pd.DataFrame.isin , but this skips the repeat values ββ(only returns a string to a unique element in selection ).
philE source share