Starting at 0.13 (soon release), you can do something like this. It uses generators to evaluate a dynamic formula. Introductory assignment via eval will be an extra feature in 0.13, see here
In [19]: df = DataFrame(randn(5, 2), columns=['a', 'b']) In [20]: df Out[20]: ab 0 -1.949107 -0.763762 1 -0.382173 -0.970349 2 0.202116 0.094344 3 -1.225579 -0.447545 4 1.739508 -0.400829 In [21]: formulas = [ ('c','a+b'), ('d', 'a*c')]
Create a generator that evaluates the formula using eval ; assigns the result, then gives the result.
In [22]: def lazy(x, formulas): ....: for col, f in formulas: ....: x[col] = x.eval(f) ....: yield x ....:
In action
In [23]: gen = lazy(df,formulas) In [24]: gen.next() Out[24]: abc 0 -1.949107 -0.763762 -2.712869 1 -0.382173 -0.970349 -1.352522 2 0.202116 0.094344 0.296459 3 -1.225579 -0.447545 -1.673123 4 1.739508 -0.400829 1.338679 In [25]: gen.next() Out[25]: abcd 0 -1.949107 -0.763762 -2.712869 5.287670 1 -0.382173 -0.970349 -1.352522 0.516897 2 0.202116 0.094344 0.296459 0.059919 3 -1.225579 -0.447545 -1.673123 2.050545 4 1.739508 -0.400829 1.338679 2.328644
Therefore, its user defined the evaluation procedure (and not on demand). Theoretically, numba will support this, so pandas probably supports this as a backend for eval (which currently uses numexpr for immediate evaluation).
my 2c.
lazy evaluation is good, but it can be easily obtained using python's own continuation / generation functions, so the possibility of creating it in pandas is, if possible, a rather difficult task, and in general it will be useful.
Jeff Oct 26 '13 at 20:39 on 2013-10-26 20:39
source share