Pandas str.count

Consider the following data block. I want to count the amount of "$" that appears on the line. I am using the str.count function in pandas ( http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.count.html ).

 >>> import pandas as pd >>> df = pd.DataFrame(['$$a', '$$b', '$c'], columns=['A']) >>> df['A'].str.count('$') 0 1 1 1 2 1 Name: A, dtype: int64 

I expected the result to be [2,2,1] . What am I doing wrong?

In Python, the count function in a string module returns the correct result.

 >>> a = "$$$$abcd" >>> a.count('$') 4 >>> a = '$abcd$dsf$' >>> a.count('$') 3 
+5
source share
4 answers

$ has special meaning in RegEx - this is the end of the line, so try the following:

 In [21]: df.A.str.count(r'\$') Out[21]: 0 2 1 2 2 1 Name: A, dtype: int64 
+4
source

As the other answers noted, the problem here is that $ is the end of line. If you are not going to use regular expressions, you may find that using str.count (i.e. the method from the str built-in type) is faster than its pandas.

 In [39]: df['A'].apply(lambda x: x.count('$')) Out[39]: 0 2 1 2 2 1 Name: A, dtype: int64 In [40]: %timeit df['A'].str.count(r'\$') 1000 loops, best of 3: 243 ยตs per loop In [41]: %timeit df['A'].apply(lambda x: x.count('$')) 1000 loops, best of 3: 202 ยตs per loop 
+3
source

Try the template [$] so that it does not treat $ as the end of a character (see this cheatsheet ), if you put it in square brackets [] , then it treats it as an alphabetic character:

 In [3]: df = pd.DataFrame(['$$a', '$$b', '$c'], columns=['A']) df['A'].str.count('[$]') Out[3]: 0 2 1 2 2 1 Name: A, dtype: int64 
+2
source

taking a cue from @fuglede

 pd.Series([x.count('$') for x in df.A.values.tolist()], df.index) 

as @jezrael pointed out, the above fails when there is a null type, so ...

 def tc(x): try: return x.count('$') except: return 0 pd.Series([tc(x) for x in df.A.values.tolist()], df.index) 

timings

 np.random.seed([3,1415]) df = pd.Series(np.random.randint(0, 100, 100000)) \ .apply(lambda x: '\$' * x).to_frame('A') df.A.replace('', np.nan, inplace=True) def tc(x): try: return x.count('$') except: return 0 

enter image description here

+1
source

Source: https://habr.com/ru/post/1260544/


All Articles