Assign an intermediate output of a variable tempo as part of the dplyr pipeline

Q: In the R dplyr pipeline, how can I assign some intermediate output to the temp variable for future use on the pipeline?

My approach below works. But it assigns a global frame, which is undesirable. There must be a better way, right? I decided that my comment approach would bring the desired results. No dice. It is vague why this did not work.

df <- data.frame(a = LETTERS[1:3], b=1:3) df %>% filter(b < 3) %>% assign("tmp", ., envir = .GlobalEnv) %>% # works #assign("tmp", .) %>% # doesn't work mutate(b = b*2) %>% bind_rows(tmp) ab 1 A 2 2 B 4 3 A 1 4 B 2 
+8
source share
5 answers

pipeR is a package that extends the capabilities of a pipe without adding different pipes (as magrittr does magrittr ). To assign, you pass the variable name enclosed in quotation marks ~ as an element in your pipe:

 library(dplyr) library(pipeR) df %>>% filter(b < 3) %>>% (~tmp) %>>% mutate(b = b*2) %>>% bind_rows(tmp) ## ab ## 1 A 2 ## 2 B 4 ## 3 A 1 ## 4 B 2 tmp ## ab ## 1 A 1 ## 2 B 2 

Although the syntax is not very descriptive, pipeR very well documented .

+8
source

This does not create an object in a global environment:

 df %>% filter(b < 3) %>% { { . -> tmp } %>% mutate(b = b*2) %>% bind_rows(tmp) } 

It can also be used for debugging if you use . ->> tmp . ->> tmp . ->> tmp . ->> tmp instead . → tmp . → tmp . → tmp . → tmp or paste this into the pipeline:

 { browser(); . } %>% 
+12
source

You can create the desired object in the place where it is needed. For instance:

 df %>% filter(b < 3) %>% mutate(b = b*2) %>% bind_rows(df %>% filter(b < 3)) 

This method avoids filtering twice:

 df %>% filter(b < 3) %>% bind_rows(., mutate(., b = b*2)) 
+4
source

I often find the need to keep the intermediate product in the pipeline. Although my use case usually avoids duplicating filters for subsequent separation, manipulation, and reassembly, this method may work well here:

 df %>% filter(b < 3) %>% {. ->> intermediateResult} %>% # this saves intermediate mutate(b = b*2) %>% bind_rows(intermediateResult) 
+4
source

I was interested in a debugging question (the desire to save intermediate results so that I could view and manipulate them from the console without breaking the pipeline into two parts, which is cumbersome. Therefore, for my purposes, the only problem with the OP solution was that the original solution was that it was slightly verbose.

This can be fixed by defining a helper function:

 to_var <- function(., ..., env=.GlobalEnv) { var_name = quo_name(quos(...)[[1]]) assign(var_name, ., envir=env) . } 

Which can then be used as follows:

 df <- data.frame(a = LETTERS[1:3], b=1:3) df %>% filter(b < 3) %>% to_var(tmp) %>% mutate(b = b*2) %>% bind_rows(tmp) # tmp still exists here 

This still uses the global environment, but you can also explicitly pass in a more local environment, as in the following example:

 f <- function() { df <- data.frame(a = LETTERS[1:3], b=1:3) env = environment() df %>% filter(b < 3) %>% to_var(tmp, env=env) %>% mutate(b = b*2) %>% bind_rows(tmp) } f() # tmp does not exist here 

The problem with the decision made is that it does not work out of the box with tubes connecting the threads. G. Grothendieck's solution does not work at all for the debugging option. (update: see J. Grothendieck's comment below and his updated answer!)

Finally, the reason assign("tmp",.) %>% does not work because the default envir argument for assign() is the "current environment" (see the documentation for assign ), which differs at each stage of the pipeline. To see this, try pasting { print(environment());. } %>% { print(environment());. } %>% { print(environment());. } %>% { print(environment());. } %>% { print(environment());. } %>% { print(environment());. } %>% to the pipeline at different points and we see that each time a different address is printed. (You can probably change the definition of to_var to use the grandfather environment by default instead.)

0
source

Source: https://habr.com/ru/post/1011926/


All Articles