Select rows by partial match of rows in the index

Question

Select rows by partial match of rows in the index

Having a series like this:

ds = Series({'wikipedia':10,'wikimedia':22,'wikitravel':33,'google':40}) google 40 wikimedia 22 wikipedia 10 wikitravel 33 dtype: int64

I would like to select strings where 'wiki' is part of the index label (partial string label).

At the moment I tried

 ds[ds.index.map(lambda x: 'wiki' in x)] wikimedia 22 wikipedia 10 wikitravel 33 Name: site, dtype: int64

and he does the job, but somehow the index cry for “contains” is exactly the same as for the columns ...

Best way to do this?

+9

python pandas

ronszon May 17, '13 at 20:30

source share

3 answers

Andy hayden · Answer 1 · 2013-05-17T20:37:01+0000

A somewhat cheeky way might be to use loc :

 In [11]: ds.loc['wiki': 'wikj'] Out[11]: wikimedia 22 wikipedia 10 wikitravel 33 dtype: int64

This is essentially equivalent to ds[ds.index.map(lambda s: s.startswith('wiki'))] .

To do, contains, as @DSM suggests, it is probably best written as:

 ds[['wiki' in s for s in ds.index]]

Chris · Answer 2 · 2017-09-20T08:48:00+0000

Another solution using filter see here :

 >>> ds.filter(like='wiki', axis=0) wikimedia 22 wikipedia 10 wikitravel 33 dtype: int64

cs95 · Answer 3 · 2019-01-22T18:51:24+0000

How to select rows by partially matching rows in an index?

Updated for: 2019

Now we have “vectorized” string methods for these operations (in fact, they have been around for some time). All solutions are applicable as is with DataFrames.

Tune

 s = pd.Series({'foo': 'x', 'foobar': 'y', 'baz': 'z'}) s foo x foobar y baz z dtype: object df = s.to_frame('abc') df abc foo x foobar y baz z

The same solution applies to both s and df !

Prefix Search: `str.startswith`

str dtype (more precisely, object dtype). pd.Index objects now come with the str methods themselves, so you can more idiomatically define this with Series.str.startswith ,

 # For the series, s.index.str.startswith('foo') # Similarly, for the DataFrame, df.index.str.startswith('foo') # array([ True, True, False])

To select with this result, you can use logical indexing,

 s[s.index.str.startswith('foo') ] foo x foobar y dtype: object df[df.index.str.startswith('foo')] abc foo x foobar y

Search anywhere: `str.contains`

Use Series.str.contains to execute Series.str.contains on a substring or regular expression anywhere in the string:

 s.index.str.contains('foo') # Similarly, df.index.str.contains('foo') # array([ True, True, False])

If you just match substrings, you can safely disable s.index.str.contains('foo', regex=False) search for better performance: s.index.str.contains('foo', regex=False)

For regular expressions you can use

 s.index.str.contains('ba') # Similarly, df.index.str.contains('ba') # array([False, True, True])

Micro optimization with a list

In terms of performance, lists are understood faster. The first option can be rewritten,

 [x.startswith('foo') for x in s.index] # [True, True, False] s[[x.startswith('foo') for x in s.index]] foo x foobar y dtype: object

With regex, you can precompile the template and call re.search . For more information, see My Extensive Review of For Panda Loops - When Should I Care? ,

Select rows by partial match of rows in the index

How to select rows by partially matching rows in an index?

Updated for: 2019

Prefix Search: str.startswith

Search anywhere: str.contains

Micro optimization with a list

More articles:

Prefix Search: `str.startswith`

Search anywhere: `str.contains`