How to select rows by partially matching rows in an index?
Updated for: 2019
Now we have βvectorizedβ string methods for these operations (in fact, they have been around for some time). All solutions are applicable as is with DataFrames.
Tune
s = pd.Series({'foo': 'x', 'foobar': 'y', 'baz': 'z'}) s foo x foobar y baz z dtype: object df = s.to_frame('abc') df abc foo x foobar y baz z
The same solution applies to both s and df !
Prefix Search: str.startswith
str dtype (more precisely, object dtype). pd.Index objects now come with the str methods themselves, so you can more idiomatically define this with Series.str.startswith ,
# For the series, s.index.str.startswith('foo')
To select with this result, you can use logical indexing,
s[s.index.str.startswith('foo') ] foo x foobar y dtype: object df[df.index.str.startswith('foo')] abc foo x foobar y
Search anywhere: str.contains
Use Series.str.contains to execute Series.str.contains on a substring or regular expression anywhere in the string:
s.index.str.contains('foo')
If you just match substrings, you can safely disable s.index.str.contains('foo', regex=False) search for better performance: s.index.str.contains('foo', regex=False)
For regular expressions you can use
s.index.str.contains('ba')
Micro optimization with a list
In terms of performance, lists are understood faster. The first option can be rewritten,
[x.startswith('foo') for x in s.index]
With regex, you can precompile the template and call re.search . For more information, see My Extensive Review of For Panda Loops - When Should I Care? ,