In-Place Operation by pandas DataFrame

Question

In-Place Operation by pandas DataFrame

Suppose I have this:

>>> x = pandas.DataFrame([[1.0, 2.0, 3.0], [3, 4, 5]], columns=["A", "B", "C"]) >>> print x ABC 0 1 2 3 1 3 4 5

Now I want to normalize x by line --- that is, divide each line by its sum. As described in this question , this can be achieved with x = x.div(x.sum(axis=1), axis=0) . However, this creates a new DataFrame. If my DataFrame is large, a lot of memory can be used to create a new DataFrame, although I immediately assign it to its original name.

Is there an effective way to perform this operation? I need something like x.idiv() which provides an axis div option, but updates x in place. For this particular case, I need division, but sometimes it would be nice to have similar versions in place for all the basic operations.

(I can update it in place by repeating it line by line and assigning each normalized line back to the original, but this is slower, and I'm looking for a more efficient solution.)

+6

python pandas

Brenbarn Nov 08 '13 at 7:18

source share

1 answer

Andy hayden · Accepted Answer · 2013-11-08T07:36:28+0000

You can do this directly in numpy (without making a copy):

 In [11]: x1 = x.values.T In [12]: x1 Out[12]: array([[ 1., 3.], [ 2., 4.], [ 3., 5.]]) In [13]: x1 /= x1.sum(0) In [14]: x Out[14]: ABC 0 0.166667 0.333333 0.500000 1 0.250000 0.333333 0.416667

Maybe there should be an inplace flag for div ...?

In-Place Operation by pandas DataFrame

More articles: