Assignments with a Pandas DataFrame with various types of float32 and float64 for some combinations are rather slow, as I do.
The code below sets a DataFrame, calculates Numpy / Scipy on a piece of data, sets a new DataFrame by copying the old one, and assigns the result of the calculation from the new DataFrame:
import pandas as pd import numpy as np from scipy.signal import lfilter N = 1000 M = 1000 def f(dtype1, dtype2): coi = [str(m) for m in range(M)] df = pd.DataFrame([[m for m in range(M)] + ['Hello', 'World'] for n in range(N)], columns=coi + ['A', 'B'], dtype=dtype1) Y = lfilter([1], [0.5, 0.5], df.ix[:, coi]) Y = Y.astype(dtype2) new = pd.DataFrame(df, copy=True) print(new.iloc[0, 0].dtype) print(Y.dtype) new.ix[:, coi] = Y
Result of synchronization:
---------- float32 float32 float64 10.1998147964 ---------- float32 float64 float64 10.2371120453 ---------- float64 float32 float64 0.864870071411 ---------- float64 float64 float64 0.866265058517
Here the critical line is new.ix[:, coi] = Y : it is ten times slower for some combinations.
I can understand that redistribution requires some overhead when there is a DataFrame with a floating point and it is assigned float64. But why is overhead so dramatic.
In addition, the combination of float32 and float32 assignments is also slow, and the result is float64, which also bothers me.