What is the difference between using .apply or passing a dataframe column

Question

What is the difference between using .apply or passing a dataframe column

I want to apply a simple function to a column in a Pandas Dataframe. I did this in two ways:

1. df['column1']=myFunction(df['column1'])
2. df['column1']=df['column1'].apply(lambda x:myFunction[x])

My dataset is not large enough to be able to tell the difference, but I assume it will be related to speed.

Can someone explain what the difference is and which one is preferable?

+4

python pandas

Juan daza Dec 13 '16 at 10:42

source share

1 answer

Julien Marrec · Answer 1 · 2016-12-14T00:34:34+0000

1. df['column1']=myFunction(df['column1'])

Here you define the function to be applied to pd.Series. You let pandas handle how this happens.

2. df['column1']=df['column1'].apply(lambda x:myFunction[x])

Here you apply a function for each element.

1 , 2. , myFunction, .

:

100 000 ( , ) column1:

In [1]: 
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(100000,2),
                  columns=['column1','column2'])

def myFunction(s):
    return s**2

In [2]: %%timeit
    ...: myFunction(df.column1)
    ...: 
1000 loops, best of 3: 1.68 ms per loop

In [3]: %%timeit
    ...: df.column1.apply(lambda x: x**2)
    ...: 
10 loops, best of 3: 55.4 ms per loop

, 30 , pd.Series, . , myFunction .

, myFunction , :

In [4]: def myFunction(s):
...:     return s.apply(lambda x: x**2)
...: 

In [4]: %%timeit
    ...: myFunction(df.column1)
    ...: 
10 loops, best of 3: 53.9 ms per loop

, apply

What is the difference between using .apply or passing a dataframe column

More articles: