Python Pandas apply function

I am trying to use apply to avoid an iterator iterrows()in a function:

However, the pandas method is poorly documented and I cannot find an example on how to use it, except for lame .apply(sq.rt)in the documentation ... There is no example on how to use arguments, etc.

Anyway, here is an example of a toy on what I'm trying to do.

In my understanding, it applywill actually do the same thing as iterrows(), i.e. iteration (row by row, if axis = 0). At each iteration, the input element of the xfunction must be crossed out. However, the error messages that I continue to receive refute this assumption ...

grid = np.random.rand(5,2)
df = pd.DataFrame(grid)

def multiply(x):
    x[3]=x[0]*x[1]

df = df.apply(multiply, axis=0)

The above example returns an empty df. Can anyone shed light on my misunderstanding?

+4
4
import pandas as pd
import numpy as np

grid = np.random.rand(5,2)
df = pd.DataFrame(grid)

def multiply(x):
    return x[0]*x[1]

df['multiply'] = df.apply(multiply, axis = 1)
print(df)

:

          0         1  multiply
0  0.550750  0.713054  0.392715
1  0.061949  0.661614  0.040987
2  0.472134  0.783479  0.369907
3  0.827371  0.277591  0.229670
4  0.961102  0.137510  0.132162

:

, apply ing, . , . axis, , .

, , 'multiply' . df[3] = ..., , :

          0         1         3
0  0.550750  0.713054  0.392715
1  0.061949  0.661614  0.040987
2  0.472134  0.783479  0.369907
3  0.827371  0.277591  0.229670
4  0.961102  0.137510  0.132162
+3

apply - , /. None, multiply , . apply , .

. .

multiply:

def multiply(x):
    return x[0]*x[1]

df[3] = df.apply(multiply, 'columns')

, apply , . .

df[3] = df[0]*df[1]

, apply, , , .

+1

, -. . Apply

df['multiply'] = df.apply(lambda row: row[0] * row[1], axis = 1)

, @Andy

This can be useful if your function is in the form

def multiply(a,b):
    return a*b

df['multiply'] = df.apply(lambda row: multiply(row[0] ,row[1]), axis = 1)

Additional examples in Performance Improvement

0
source

One of the rules Pandas Zen says always try to find a vectorized solution first.

.apply(..., axis=1) not vectorized!

Consider the alternatives:

In [164]: df.prod(axis=1)
Out[164]:
0    0.770675
1    0.539782
2    0.318027
3    0.597172
4    0.211643
dtype: float64

In [165]: df[0] * df[1]
Out[165]:
0    0.770675
1    0.539782
2    0.318027
3    0.597172
4    0.211643
dtype: float64

Timing versus 50,000 lines of DF:

In [166]: df = pd.concat([df] * 10**4, ignore_index=True)

In [167]: df.shape
Out[167]: (50000, 2)

In [168]: %timeit df.apply(multiply, axis=1)
1 loop, best of 3: 6.12 s per loop

In [169]: %timeit df.prod(axis=1)
100 loops, best of 3: 6.23 ms per loop

In [170]: def multiply_vect(x1, x2):
     ...:     return x1*x2
     ...:

In [171]: %timeit multiply_vect(df[0], df[1])
1000 loops, best of 3: 604 µs per loop

Conclusion: use .apply()as a last resort (i.e. when nothing helps)

0
source

Source: https://habr.com/ru/post/1675083/


All Articles