What is the difference between using .apply or passing a dataframe column

I want to apply a simple function to a column in a Pandas Dataframe. I did this in two ways:

  • 1. df['column1']=myFunction(df['column1'])
  • 2. df['column1']=df['column1'].apply(lambda x:myFunction[x])

My dataset is not large enough to be able to tell the difference, but I assume it will be related to speed.

Can someone explain what the difference is and which one is preferable?

+4
source share
1 answer
  • 1. df['column1']=myFunction(df['column1'])

Here you define the function to be applied to pd.Series. You let pandas handle how this happens.

  • 2. df['column1']=df['column1'].apply(lambda x:myFunction[x])

Here you apply a function for each element.

1 , 2. , myFunction, .


:

100 000 ( , ) column1:

In [1]: 
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(100000,2),
                  columns=['column1','column2'])

def myFunction(s):
    return s**2

In [2]: %%timeit
    ...: myFunction(df.column1)
    ...: 
1000 loops, best of 3: 1.68 ms per loop

In [3]: %%timeit
    ...: df.column1.apply(lambda x: x**2)
    ...: 
10 loops, best of 3: 55.4 ms per loop

, 30 , pd.Series, . , myFunction .


, myFunction , :

In [4]: def myFunction(s):
...:     return s.apply(lambda x: x**2)
...: 

In [4]: %%timeit
    ...: myFunction(df.column1)
    ...: 
10 loops, best of 3: 53.9 ms per loop

, apply

+2

Source: https://habr.com/ru/post/1663687/


All Articles