Creating and using new variables in functions in R: NSE programming error in tidyverse

After reading and re-reading many of the dplyr programming tutorials, I still can't find a way to solve my specific case.

I understand that the use of group_by_ , mutate_ and such "string-friendly" versions of tidyverse functions goes out of date and that enquo is the way to go.

However, my case is somewhat different, and I try my best to find a neat way to solve it neatly.

In fact, my goal is to create and manipulate data within a function. Creation (change) of new variables based on others, their use, etc.

However, no matter how hard I try, my code either errors or returns some warnings when checking packages, for example no visible binding for global variable ...

Here's a reproducible example:

Here is what I want to do:

 df <- data.frame(X=c("A", "B", "C", "D", "E"), Y=c(1, 2, 3, 1, 1)) new_df <- df %>% group_by(Y) %>% summarise(N=n()) %>% mutate(Y=factor(Y, levels=1:5)) %>% complete(Y, fill=list(N = 0)) %>% arrange(Y) %>% rename(newY=Y) %>% mutate(Y=as.integer(newY)) 

Some common dplyr manipulations whose expected result should be:

 # A tibble: 5 x 3 newY NY <fctr> <dbl> <int> 1 1 3 1 2 2 1 2 3 3 1 3 4 4 0 4 5 5 0 5 

I would like this piece of code to work quietly inside a function. Following was my best attempt at resolving non-NSE issues:

 myfunction <- function(){ df <- data.frame(X=c("A", "B", "C", "D", "E"), Y=c(1, 2, 3, 1, 1)) new_df <- df %>% group_by_("Y") %>% summarise(!!"N":=n()) %>% mutate(!!"Y":=factor(Y, levels=1:5)) %>% complete_("Y", fill=list(N = 0)) %>% arrange_("Y") %>% rename(!!"newY":="Y") %>% mutate(!!"Y":=as.integer(newY)) } 

Unfortunately, I still have the following posts:

 myfunction: no visible global function definition for ':=' myfunction: no visible binding for global variable 'Y' myfunction: no visible binding for global variable 'newY' Undefined global functions or variables: := Y n.Factors n_optimal newY 

Is there any way to solve this problem? Thank you very much!

EDIT: I am using R 3.4.1, dplyr_0.7.4, tidyr_0.7.2 and tidyverse_1.1.1


ANSWER

Thanks to the comments I managed to solve, here is a working solution:

 myfunction <- function(){ df <- data.frame(X=c("A", "B", "C", "D", "E"), Y=c(1, 2, 3, 1, 1)) new_df <- df %>% group_by_("Y") %>% summarise_("N"=~n()) %>% mutate_("Y"= ~factor(Y, levels=1:5)) %>% complete_("Y", fill=list(N = 0)) %>% arrange_("Y") %>% rename_("newY"=~Y) %>% mutate_("Y"=~as.integer(newY)) } 

Thanks LOT :)

+5
source share
2 answers

The answer was not in the "programming with dplyr" manuals, because your problem is more general. Although your code deals with custom evaluation, your business does not need this. If you remove code that deals with custom evaluation, you will reduce the number of problems that you need to fix.

However, some important issues remain - NAMESPACE problems. You are dealing with NAMESPACE anytime you use functions from other packages inside the functions of your own package. NAMESPACE is not an easy topic, but if you write packages, it will pay off to learn a little. I recommend you read: From r-pkgs.had.co.nz/namespace.html, find the Import section and read its introduction, as well as the R-Functions subheading. This will help you understand the steps, code, and comments that I post below.

Follow these steps to resolve the issue:
- Add dplyr, magrittr and tidyr to the DESCRIPTION.
- Refer to functions as PACKAGE::FUNCTION() .
- Delete everything !! and := , because in this case you do not need them.
- Import and export of pipes from magritra.
- Import .data from rlang.
- pass global variables utils :: globalVariables ().
- Rebuild, reload, recheck.

 # I make your function shorter to focus on the important details. myfunction <- function(){ df <- data.frame( X = c("A", "B", "C", "D", "E"), Y = c(1, 2, 3, 1, 1) ) df %>% dplyr::group_by(.data$Y) %>% dplyr::summarise(N = n()) } # Fix check() notes #' @importFrom magrittr %>% #' @export magrittr::`%>%` #' @importFrom rlang .data NULL utils::globalVariables(c(".data", "n")) 
+3
source

You can use rlang::sym() (or base::as.name() ) to convert characters to characters, so let me add an alternative answer.

Note that I do not want to force you to throw away these deprecated functions. You can use what is easy to understand for you. (I find sym() more useful nonetheless)

Case 1: basic use of rlang::sym()

This code

 group_by_("Y") %>% 

can be written as

 group_by(!! rlang::sym("Y")) 

or you can even pre-assign a character to a variable.

 col_Y <- rlang::sym("Y") df %>% group_by(!! col_Y) 

Case 2: Left Side Symbols

This code is completely beautiful.

 summarise(!!"N":=n()) 

Both characters and characters are allowed for LHS. So this is good too:

 col_N <- rlang::sym("N") # ... summarise(!! col_N := n()) 

Case 3) select the semantics

select() and rename() have different semantics than other functions, such as mutate() ; it allows characters in addition to characters. This may be a slightly advanced topic. You can find a more detailed explanation in the vignette .

 More precisely, the code bellow are both permitted: rename(new = old) rename(new = "old") So, this code is fine. rename(!! "newY" := "Y") 

(example)


 reprex::reprex_info() #> Created by the reprex package v0.1.1.9000 on 2017-11-12 library(dplyr, warn.conflicts = FALSE) library(tidyr) df <- data.frame(X=c("A", "B", "C", "D", "E"), Y=c(1, 2, 3, 1, 1)) col_Y <- rlang::sym("Y") col_N <- rlang::sym("N") col_newY <- rlang::sym("newY") df %>% group_by(!! col_Y) %>% summarise(!! col_N := n()) %>% mutate(!! col_Y := factor(!! col_Y, levels=1:5)) %>% complete(!! col_Y, fill = list(N = 0)) %>% arrange(!! col_Y) %>% rename(!! col_newY := !! col_Y) %>% mutate(!! col_Y := as.integer(!! col_newY)) #> # A tibble: 5 x 3 #> newY NY #> <fctr> <dbl> <int> #> 1 1 3 1 #> 2 2 1 2 #> 3 3 1 3 #> 4 4 0 4 #> 5 5 0 5 
+1
source

Source: https://habr.com/ru/post/1273273/


All Articles