Strange utilization of factors in dplyr :: mutate - error or function?

The mutate function from the R package "dplyr" has a peculiar recirculation function for factors, since it returns the coefficient as.numeric . In the following example, y becomes what you expect, while z is c(1,1)

 library(dplyr) df <- data_frame(x=1:2) glimpse(df %>% mutate(y="A", z=factor("B"))) # Variables: # $ x (int) 1, 2 # $ y (chr) "A", "A" # $ z (int) 1, 1 

Is there any justification for this, or is this a mistake?

(I am using R 3.1.1 and dplyr 0.3.0.1.)


EDIT:

After posting this github issue, Romain Francois installed it in a few hours! Therefore, if the above problem uses devtools::install_github to get the latest version:

 library(devtools) install_github("hadley/dplyr") 

and then

 library(dplyr) df <- data_frame(x=1:2) glimpse(df %>% mutate(y="A", z=factor("B"))) # Variables: # $ x (int) 1, 2 # $ y (chr) "A", "A" # $ z (fctr) B, B 

Good work Romain!

+5
source share
1 answer

dplyr uses C ++ to perform the actual mutate operation. Below the rabbit and noting that this is an ungrouped mutation , we can use our reliable debugger to notice the following.

 debugonce(dplyr:::mutate_impl) # Inside of mutate_impl we do: class(dots[[2]]$expr) # which is a "call"! 

So now we know the type of our lazy expression . We are eval and notification is a supported type (unfortunately, R TYPEOF macro states factors are integers - we need Rf_isFactor to recognize it).

So what will happen next? We returned the result , and we finished. If you have already tried (df %>% mutate(y="A", z=factor(c("A","B"))))[[3]] , you will realize that the problem is really a recycling.

In particular , the C ++ Gatherer object (which should really check Rf_isFactor in addition to the current date check on INTSXP s) uses C ++ templating to force a Vector<INTSXP> to be (implicitly through constructor initialization - pay attention to the call arity 2 in ConstantGathererImpl ), remembering to transfer the label factor.

TL; DR: in R C ++, integers and coefficients are of the same internal type when using the TYPEOF macro, and factors are a strange case of an edge.

Feel free to send a craving request to dplyr, he is in active development, and hadley and Romain are good guys. You will need to add an if here statement.

+12
source

Source: https://habr.com/ru/post/1205254/


All Articles