In Python Pandas, how to use R dplyr mutate_each

In Python Pandas, I want to add columns by doing several aggregate functions in multiple columns, such as R dplyr mutate_each. For example, Can Python Pandas implement the same processing as the R script?

R dplyr : iris %>% group_by(Species) %>% mutate_each(funs(min, max, mean), starts_with("Sepal")) 

However, I managed to achieve the same processing as the mutation with Pandas. As shown in the code below, I could perform one aggregate function and add one column.

 R dplyr : iris %>% group_by(Species) %>% mutate(MaxSepalLen = max(Sepal.Length)) Python Pandas : iris.assign(MaxSepalLen = iris.groupby("Species")["Sepal.Length"].transform('max')) 
+5
source share
1 answer

With Pandas, this can be done in a more flexible way.

First, let's prepare the data:

 import pandas as pd import numpy as np from sklearn.datasets import load_iris iris_data = load_iris() iris = pd.DataFrame(iris_data.data, columns = [c[0:3] + c[6] for c in iris_data.feature_names]) iris['Species'] = iris_data.target_names[iris_data.target] 

Now we can simulate the mutate_each pipeline:

 # calculate the aggregates pivot = iris.groupby("Species")[iris.columns[iris.columns.str.startswith('sepal')] ].aggregate(['min', 'max', np.mean]) # name the aggregates pivot.columns = pivot.columns.get_level_values(0) + pivot.columns.get_level_values(1) # merge aggregates with the original dataframe new_iris = iris.merge(pivot, left_on='Species', right_index=True) 

The pivot table is a small pivot table:

  seplmin seplmax seplmean sepwmin sepwmax sepwmean Species setosa 4.3 5.8 5.006 2.3 4.4 3.418 versicolor 4.9 7.0 5.936 2.0 3.4 2.770 virginica 4.9 7.9 6.588 2.2 3.8 2.974 

And new_iris is a 150x11 table with all the columns from iris and pivot combined, identical to what dplyr produces.

0
source

Source: https://habr.com/ru/post/1272714/


All Articles