Some simple data to get started:
import pandas as pd import numpy as np df = pd.DataFrame({"x": np.random.normal(size=100), "y": np.random.normal(size=100)})
So, up to this point, I always thought assign
was the equivalent of mutate
in the dplyr
library. However, if I try to use the variable that I created in the assign
step in the same assign
step, I get an error message. Consider the following acceptable in R:
df %>% mutate(z = x * y, w = z + 10)
If I try the equivalent in pandas
, I get an error:
df.assign(z = df.x * df.y, w = z + 10)
The only way I can do this is to use the two steps of assign
:
df.assign(z = df.x * df.y).assign(w = lambda d: dz + 10)
Is there something I missed? Or is there another function that is more suitable?
source share