How to use the created variable in the same destination function with pandas

Some simple data to get started:

import pandas as pd import numpy as np df = pd.DataFrame({"x": np.random.normal(size=100), "y": np.random.normal(size=100)}) 

So, up to this point, I always thought assign was the equivalent of mutate in the dplyr library. However, if I try to use the variable that I created in the assign step in the same assign step, I get an error message. Consider the following acceptable in R:

 df %>% mutate(z = x * y, w = z + 10) 

If I try the equivalent in pandas , I get an error:

 df.assign(z = df.x * df.y, w = z + 10) # Error df.assign(z = df.x * df.y, w = lambda d: dz + 10) # Error 

The only way I can do this is to use the two steps of assign :

 df.assign(z = df.x * df.y).assign(w = lambda d: dz + 10) 

Is there something I missed? Or is there another function that is more suitable?

+6
source share
2 answers

You can use the DataFrame.eval (..., inplace = False) method as follows:

 In [79]: qry = """ ...: z = x * y ...: w = z + 10 # NOTE: next variable must be on a new line ...: """ In [80]: df.eval(qry, inplace=False) Out[80]: xyzw 0 -0.636271 -0.493260 0.313847 10.313847 1 0.298998 0.266673 0.079735 10.079735 2 -0.836940 -0.593346 0.496595 10.496595 3 0.497099 -0.199589 -0.099215 9.900785 4 2.187165 -0.332140 -0.726445 9.273555 5 0.472785 0.169204 0.079997 10.079997 6 -0.847666 -1.519570 1.288088 11.288088 7 1.262524 1.008820 1.273660 11.273660 8 -0.632817 -0.463941 0.293590 10.293590 9 -0.955913 -1.149799 1.099107 11.099107 10 -1.260231 0.000266 -0.000336 9.999664 11 1.054885 -1.390762 -1.467094 8.532906 12 -1.048271 0.816762 -0.856187 9.143813 13 -0.814064 -0.070574 0.057452 10.057452 14 -1.279904 -1.079151 1.381211 11.381211 15 0.223787 -0.887732 -0.198663 9.801337 16 -0.493267 -0.064099 0.031618 10.031618 17 -0.549534 0.622976 -0.342346 9.657654 18 -0.261209 0.267250 -0.069808 9.930192 19 -2.948658 1.586422 -4.677815 5.322185 20 -1.959709 1.103462 -2.162465 7.837535 21 0.595782 -0.699891 -0.416983 9.583017 22 -0.059947 -0.264011 0.015827 10.015827 23 0.012929 -1.635020 -0.021139 9.978861 24 1.387415 -1.763467 -2.446660 7.553340 .. ... ... ... ... 75 1.649346 -0.515930 -0.850948 9.149052 76 -1.111928 -0.674379 0.749861 10.749861 77 1.413567 -1.377679 -1.947441 8.052559 78 0.119227 0.382638 0.045621 10.045621 79 0.064824 -2.043595 -0.132474 9.867526 80 -1.135878 -0.116922 0.132809 10.132809 81 -0.423820 1.386475 -0.587616 9.412384 82 0.642123 -0.914807 -0.587419 9.412581 83 -0.495118 0.773073 -0.382763 9.617237 84 0.347832 -0.913034 -0.317582 9.682418 85 1.314090 1.633140 2.146093 12.146093 86 -0.277789 0.883307 -0.245373 9.754627 87 0.514091 -1.349400 -0.693714 9.306286 88 -0.140958 -0.264500 0.037283 10.037283 89 -0.975843 -0.608312 0.593617 10.593617 90 0.242816 0.749860 0.182078 10.182078 91 1.185033 -0.487483 -0.577683 9.422317 92 -0.258952 -0.532178 0.137809 10.137809 93 2.015797 1.788613 3.605481 13.605481 94 -0.415403 0.224944 -0.093442 9.906558 95 -0.082239 -1.479761 0.121693 10.121693 96 -0.707825 2.074192 -1.468165 8.531835 97 0.517926 0.043832 0.022702 10.022702 98 -0.667368 -0.916520 0.611656 10.611656 99 0.366614 0.620221 0.227382 10.227382 [100 rows x 4 columns] 
+3
source

They are not equivalent. From the docs for assign (my attention):

Assigning multiple columns within the same destination is possible, but you cannot reference other columns created as part of the same destination call .

This would be difficult to do in Python <3.6, since the order of the keyword arguments is not guaranteed. MaxU's answer using multi-line DataFrame.eval is a good alternative approach to the problem.


There is also a github issue

which notes that this behavior may be possible in Python 3.6, since the order of the kwarg arguments is preserved . It seems that behavior can be accepted in pandas 2.0.

+2
source

Source: https://habr.com/ru/post/1015320/


All Articles