You can try value_counts
:
df = df['col'].value_counts().reset_index() df.columns = ['col', 'count'] print df col count 0 1 5 1 2 3
EDIT:
print (df['col'] == 1).sum() 5
Or:
def somecalulation(x): return (df['col'] == x).sum() print somecalulation(1) 5 print somecalulation(2) 3
Or:
ser = df['col'].value_counts() def somecalulation(s, x): return s[x] print somecalulation(ser, 1) 5 print somecalulation(ser, 2) 3
EDIT2:
If you need something really fast, use numpy.in1d
:
import pandas as pd import numpy as np a = pd.Series([1, 1, 1, 1, 2, 2])
Delay
len(a)=6
:
In [131]: %timeit np.in1d(a,1).sum() The slowest run took 9.17 times longer than the fastest. This could mean that an intermediate result is being cached 10000 loops, best of 3: 29.9 µs per loop In [132]: %timeit np.sum(a == 1) 10000 loops, best of 3: 196 µs per loop In [133]: %timeit (a == 1).sum() 1000 loops, best of 3: 180 µs per loop
len(a)=6000
:
In [135]: %timeit np.in1d(a,1).sum() The slowest run took 7.29 times longer than the fastest. This could mean that an intermediate result is being cached 10000 loops, best of 3: 48.5 µs per loop In [136]: %timeit np.sum(a == 1) The slowest run took 5.23 times longer than the fastest. This could mean that an intermediate result is being cached 1000 loops, best of 3: 273 µs per loop In [137]: %timeit (a == 1).sum() 1000 loops, best of 3: 271 µs per loop
source share