Option 1
pd.Series.str.match
df.a.str.match('^r\d{5}$')
1 True
2 True
3 False
4 True
5 False
Name: a, dtype: bool
Use it as a filter
df[df.a.str.match('^r\d{5}$')]
a b
1 r00001 1
2 r00010 2
4 r01234 4
Option 2
Accounting for a custom list using string methods
f = lambda s: s.startswith('r') and (len(s) == 6) and s[1:].isdigit()
[f(s) for s in df.a.values.tolist()]
[False, True, True, False, True, False]
Use it as a filter
df[[f(s) for s in df.a.values.tolist()]]
a b
1 r00001 1
2 r00010 2
4 r01234 4
The timing
df = pd.concat([df] * 10000, ignore_index=True)
%timeit df[[s.startswith('r') and (len(s) == 6) and s[1:].isdigit() for s in df.a.values.tolist()]]
%timeit df[df.a.str.match('^r\d{5}$')]
%timeit df[df.a.str.contains('^r\d{5}$')]
10 loops, best of 3: 22.8 ms per loop
10 loops, best of 3: 33.8 ms per loop
10 loops, best of 3: 34.8 ms per loop
source
share