Dplyr - use column names as function arguments

With a data frame, I use dplyr to aggregate some column, as shown below.

> data <- data.frame(a=rep(1:2,3), b=c(6:11))
> data
  a  b
1 1  6
2 2  7
3 1  8
4 2  9
5 1 10
6 2 11
> data %>% group_by(a) %>% summarize(tot=sum(b))
# A tibble: 2 x 2
      a   tot
  <int> <int>
1     1    24
2     2    27

It's fine. However, I want to create a reusable function for it, so that the column name can be passed as an argument.

Looking at answers to related questions such as here , I tried the following.

sumByColumn <- function(df, colName) {
  df %>%
  group_by(a) %>%
  summarize(tot=sum(colName))
  df
}

However, I cannot get it to work.

> sumByColumn(data, "b")

 Error in summarise_impl(.data, dots) : 
  Evaluation error: invalid 'type' (character) of argument. 

> sumByColumn(data, b)

 Error in summarise_impl(.data, dots) : 
  Evaluation error: object 'b' not found. 
> 
+4
source share
2 answers

This can work using the latest syntax dplyr(as seen on github ):

library(dplyr)
library(rlang)
sumByColumn <- function(df, colName) {
  df %>%
    group_by(a) %>%
    summarize(tot = sum(!! sym(colName)))
}

sumByColumn(data, "b")
## A tibble: 2 x 2
#      a   tot
#  <int> <int>
#1     1    24
#2     2    27

And an alternative way to specify bas a variable:

library(dplyr)
sumByColumn <- function(df, colName) {
  myenc <- enquo(colName)
  df %>%
    group_by(a) %>%
    summarize(tot = sum(!!myenc))
}

sumByColumn(data, b)
## A tibble: 2 x 2
#      a   tot
#  <int> <int>
#1     1    24
#2     2    27
+4
source

dplyr (summarise_at, vars, funs)

sumByColumn <- function(df, colName) {
  df %>%
    group_by(a) %>%
    summarize_at(vars(colName), funs(tot = sum))
}

# A tibble: 2 x 2
      # a   tot
  # <int> <int>
# 1     1    24
# 2     2    27
+4

Source: https://habr.com/ru/post/1691618/


All Articles