Pandas str.count

Question

Pandas str.count

Consider the following data block. I want to count the amount of "$" that appears on the line. I am using the str.count function in pandas ( http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.count.html ).

 >>> import pandas as pd >>> df = pd.DataFrame(['$$a', '$$b', '$c'], columns=['A']) >>> df['A'].str.count('$') 0 1 1 1 2 1 Name: A, dtype: int64

I expected the result to be [2,2,1] . What am I doing wrong?

In Python, the count function in a string module returns the correct result.

 >>> a = "$$$$abcd" >>> a.count('$') 4 >>> a = '$abcd$dsf$' >>> a.count('$') 3

+5

python pandas

user4979733 Nov 29 '16 at 20:58

source share

4 answers

As the other answers noted, the problem here is that $ is the end of line. If you are not going to use regular expressions, you may find that using str.count (i.e. the method from the str built-in type) is faster than its pandas.

 In [39]: df['A'].apply(lambda x: x.count('$')) Out[39]: 0 2 1 2 2 1 Name: A, dtype: int64 In [40]: %timeit df['A'].str.count(r'\$') 1000 loops, best of 3: 243 µs per loop In [41]: %timeit df['A'].apply(lambda x: x.count('$')) 1000 loops, best of 3: 202 µs per loop

+3

fuglede Nov 29 '16 at 21:06

source share

Try the template [$] so that it does not treat $ as the end of a character (see this cheatsheet ), if you put it in square brackets [] , then it treats it as an alphabetic character:

 In [3]: df = pd.DataFrame(['$$a', '$$b', '$c'], columns=['A']) df['A'].str.count('[$]') Out[3]: 0 2 1 2 2 1 Name: A, dtype: int64

+2

Edchum Nov 29 '16 at 21:01

source share

taking a cue from @fuglede

 pd.Series([x.count('$') for x in df.A.values.tolist()], df.index)

as @jezrael pointed out, the above fails when there is a null type, so ...

 def tc(x): try: return x.count('$') except: return 0 pd.Series([tc(x) for x in df.A.values.tolist()], df.index)

timings

 np.random.seed([3,1415]) df = pd.Series(np.random.randint(0, 100, 100000)) \ .apply(lambda x: '\$' * x).to_frame('A') df.A.replace('', np.nan, inplace=True) def tc(x): try: return x.count('$') except: return 0

+1

piRSquared Nov 29 '16 at 21:16

source share

Maxu · Accepted Answer · 2016-11-29T21:00:09+0000

$ has special meaning in RegEx - this is the end of the line, so try the following:

 In [21]: df.A.str.count(r'\$') Out[21]: 0 2 1 2 2 1 Name: A, dtype: int64

Pandas str.count

More articles: