Replace mutate (dplyr package) in python pandas

Question

Replace mutate (dplyr package) in python pandas

Is there a function similar to mutate (dplyr) with which I can add a new column for grouped data by applying a function to one of the columns of grouped data? The following is a detailed explanation of the problem:

I have generated data samples using the following code

x<- data.frame(country = rep(c("US", "UK"), 5), state = c(letters[1:10]), pop=sample(10000:50000,10))

Now I want to add a new column with the maximum number for the USA and Great Britain. I can do this using the following function in R

x<- group_by(x, country)
x<- mutate(x,max_pop = max(pop))
x<- arrange(x, country)

So my question is: how to do this in Python using pandas. I tried to follow but it did not work

x['max_pop'] = x.groupby('country').pop.apply(max)

+4

python pandas r dplyr

saurav shekhar Dec 14 '16 at 16:46

source share

1 answer

piRSquared · Accepted Answer · 2016-12-14T16:50:13+0000

transform. transform , , , .

x['max_pop'] = x.groupby('country').pop.transform('max')

import pandas as pd 

x = pd.DataFrame(dict(
    country=['US','UK','US','UK'],
    state=['a','b','c','d'],
    pop=[37088, 46987, 17116, 20484]
))

Replace mutate (dplyr package) in python pandas

More articles: