Select rows by partial match of rows in the index

Having a series like this:

ds = Series({'wikipedia':10,'wikimedia':22,'wikitravel':33,'google':40}) google 40 wikimedia 22 wikipedia 10 wikitravel 33 dtype: int64 

I would like to select strings where 'wiki' is part of the index label (partial string label).

At the moment I tried

 ds[ds.index.map(lambda x: 'wiki' in x)] wikimedia 22 wikipedia 10 wikitravel 33 Name: site, dtype: int64 

and he does the job, but somehow the index cry for β€œcontains” is exactly the same as for the columns ...

Best way to do this?

+9
source share
3 answers

A somewhat cheeky way might be to use loc :

 In [11]: ds.loc['wiki': 'wikj'] Out[11]: wikimedia 22 wikipedia 10 wikitravel 33 dtype: int64 

This is essentially equivalent to ds[ds.index.map(lambda s: s.startswith('wiki'))] .

To do, contains, as @DSM suggests, it is probably best written as:

 ds[['wiki' in s for s in ds.index]] 
+11
source

Another solution using filter see here :

 >>> ds.filter(like='wiki', axis=0) wikimedia 22 wikipedia 10 wikitravel 33 dtype: int64 
+4
source

How to select rows by partially matching rows in an index?

Updated for: 2019

Now we have β€œvectorized” string methods for these operations (in fact, they have been around for some time). All solutions are applicable as is with DataFrames.

Tune

 s = pd.Series({'foo': 'x', 'foobar': 'y', 'baz': 'z'}) s foo x foobar y baz z dtype: object df = s.to_frame('abc') df abc foo x foobar y baz z 

The same solution applies to both s and df !


Prefix Search: str.startswith

str dtype (more precisely, object dtype). pd.Index objects now come with the str methods themselves, so you can more idiomatically define this with Series.str.startswith ,

 # For the series, s.index.str.startswith('foo') # Similarly, for the DataFrame, df.index.str.startswith('foo') # array([ True, True, False]) 

To select with this result, you can use logical indexing,

 s[s.index.str.startswith('foo') ] foo x foobar y dtype: object df[df.index.str.startswith('foo')] abc foo x foobar y 

Search anywhere: str.contains

Use Series.str.contains to execute Series.str.contains on a substring or regular expression anywhere in the string:

 s.index.str.contains('foo') # Similarly, df.index.str.contains('foo') # array([ True, True, False]) 

If you just match substrings, you can safely disable s.index.str.contains('foo', regex=False) search for better performance: s.index.str.contains('foo', regex=False)

For regular expressions you can use

 s.index.str.contains('ba') # Similarly, df.index.str.contains('ba') # array([False, True, True]) 

Micro optimization with a list

In terms of performance, lists are understood faster. The first option can be rewritten,

 [x.startswith('foo') for x in s.index] # [True, True, False] s[[x.startswith('foo') for x in s.index]] foo x foobar y dtype: object 

With regex, you can precompile the template and call re.search . For more information, see My Extensive Review of For Panda Loops - When Should I Care? ,

+3
source

Source: https://habr.com/ru/post/945296/


All Articles