Assign an intermediate output of a variable tempo as part of the dplyr pipeline

Question

Assign an intermediate output of a variable tempo as part of the dplyr pipeline

Q: In the R dplyr pipeline, how can I assign some intermediate output to the temp variable for future use on the pipeline?

My approach below works. But it assigns a global frame, which is undesirable. There must be a better way, right? I decided that my comment approach would bring the desired results. No dice. It is vague why this did not work.

df <- data.frame(a = LETTERS[1:3], b=1:3) df %>% filter(b < 3) %>% assign("tmp", ., envir = .GlobalEnv) %>% # works #assign("tmp", .) %>% # doesn't work mutate(b = b*2) %>% bind_rows(tmp) ab 1 A 2 2 B 4 3 A 1 4 B 2

+8

r pipeline dplyr

lowndrul Nov 01 '16 at 10:51

source share

5 answers

This does not create an object in a global environment:

 df %>% filter(b < 3) %>% { { . -> tmp } %>% mutate(b = b*2) %>% bind_rows(tmp) }

It can also be used for debugging if you use . ->> tmp . ->> tmp . ->> tmp . ->> tmp instead . → tmp . → tmp . → tmp . → tmp or paste this into the pipeline:

 { browser(); . } %>%

+12

G. grothendieck Nov 02 '16 at 13:54

source share

You can create the desired object in the place where it is needed. For instance:

 df %>% filter(b < 3) %>% mutate(b = b*2) %>% bind_rows(df %>% filter(b < 3))

This method avoids filtering twice:

 df %>% filter(b < 3) %>% bind_rows(., mutate(., b = b*2))

+4

eipi10 Nov 01 '16 at 10:56

source share

I often find the need to keep the intermediate product in the pipeline. Although my use case usually avoids duplicating filters for subsequent separation, manipulation, and reassembly, this method may work well here:

 df %>% filter(b < 3) %>% {. ->> intermediateResult} %>% # this saves intermediate mutate(b = b*2) %>% bind_rows(intermediateResult)

+4

Gga nderson Dec 19 '17 at 3:38

source share

I was interested in a debugging question (the desire to save intermediate results so that I could view and manipulate them from the console without breaking the pipeline into two parts, which is cumbersome. Therefore, for my purposes, the only problem with the OP solution was that the original solution was that it was slightly verbose.

This can be fixed by defining a helper function:

 to_var <- function(., ..., env=.GlobalEnv) { var_name = quo_name(quos(...)[[1]]) assign(var_name, ., envir=env) . }

Which can then be used as follows:

 df <- data.frame(a = LETTERS[1:3], b=1:3) df %>% filter(b < 3) %>% to_var(tmp) %>% mutate(b = b*2) %>% bind_rows(tmp) # tmp still exists here

This still uses the global environment, but you can also explicitly pass in a more local environment, as in the following example:

 f <- function() { df <- data.frame(a = LETTERS[1:3], b=1:3) env = environment() df %>% filter(b < 3) %>% to_var(tmp, env=env) %>% mutate(b = b*2) %>% bind_rows(tmp) } f() # tmp does not exist here

The problem with the decision made is that it does not work out of the box with tubes connecting the threads. ~~G. Grothendieck's solution does not work at all for the debugging option.~~ (update: see J. Grothendieck's comment below and his updated answer!)

Finally, the reason assign("tmp",.) %>% does not work because the default envir argument for assign() is the "current environment" (see the documentation for assign ), which differs at each stage of the pipeline. To see this, try pasting { print(environment());. } %>% { print(environment());. } %>% { print(environment());. } %>% { print(environment());. } %>% { print(environment());. } %>% { print(environment());. } %>% to the pipeline at different points and we see that each time a different address is printed. (You can probably change the definition of to_var to use the grandfather environment by default instead.)

0

user3780389 Nov 30 '17 at 19:19

source share

alistaire · Accepted Answer · 2016-11-01T23:49:50+0000

pipeR is a package that extends the capabilities of a pipe without adding different pipes (as magrittr does magrittr ). To assign, you pass the variable name enclosed in quotation marks ~ as an element in your pipe:

 library(dplyr) library(pipeR) df %>>% filter(b < 3) %>>% (~tmp) %>>% mutate(b = b*2) %>>% bind_rows(tmp) ## ab ## 1 A 2 ## 2 B 4 ## 3 A 1 ## 4 B 2 tmp ## ab ## 1 A 1 ## 2 B 2

Although the syntax is not very descriptive, pipeR very well documented .

Assign an intermediate output of a variable tempo as part of the dplyr pipeline

More articles: