In this situation, I use the indicator array float, encoded as: 0 = False, 1 = True and NaN = missing. A Pandas DataFrame with a bool dtype cannot have missing values, and a DataFrame with an object dtype containing a combination of Python bool and float objects is inefficient. This leads to the use of DataFrames with np.float64 dtype. numpy.sign(x - threshold) gives -1 = (x <threshold), 0 = (x == threshold) and +1 = (x> threshold) for your comparison, which may be good enough for your purposes, but if you really need 0/1, the conversion can be done in place. The dates below are presented in an array of 200 KB x length:
In [45]: %timeit y = (x > 0); y[pd.isnull(x)] = np.nan 100 loops, best of 3: 8.71 ms per loop In [46]: %timeit y = np.sign(x) 100 loops, best of 3: 1.82 ms per loop In [47]: %timeit y = np.sign(x); y += 1; y /= 2 100 loops, best of 3: 3.78 ms per loop
source share