Knitr vs. R Interactive Behavior

Question

Knitr vs. R Interactive Behavior

I am retelling my problem here after I noticed that this was the approach recommended by the bookstore author to get additional help.

I'm a bit of a puzzle with a .Rmd file, which I can continue to take turns in an interactive R session, as well as with R CMD BATCH , but this fails when using knit("test.Rmd") . I'm not sure where the problem is, and I tried to narrow down the problem as much as I could. Here is an example (in test.Rmd ):

 ```{r Rinit, include = FALSE, cache = FALSE} opts_knit$set(stop_on_error = 2L) library(adehabitatLT) ``` The functions to be used later: ```{r functions} ld <- function(ltraj) { if (!inherits(ltraj, "ltraj")) stop("ltraj should be of class ltraj") inf <- infolocs(ltraj) df <- data.frame( x = unlist(lapply(ltraj, function(x) x$x)), y = unlist(lapply(ltraj, function(x) x$y)), date = unlist(lapply(ltraj, function(x) x$date)), dx = unlist(lapply(ltraj, function(x) x$dx)), dy = unlist(lapply(ltraj, function(x) x$dy)), dist = unlist(lapply(ltraj, function(x) x$dist)), dt = unlist(lapply(ltraj, function(x) x$dt)), R2n = unlist(lapply(ltraj, function(x) x$R2n)), abs.angle = unlist(lapply(ltraj, function(x) x$abs.angle)), rel.angle = unlist(lapply(ltraj, function(x) x$rel.angle)), id = rep(id(ltraj), sapply(ltraj, nrow)), burst = rep(burst(ltraj), sapply(ltraj, nrow))) class(df$date) <- c("POSIXct", "POSIXt") attr(df$date, "tzone") <- attr(ltraj[[1]]$date, "tzone") if (!is.null(inf)) { nc <- ncol(inf[[1]]) infdf <- as.data.frame(matrix(nrow = nrow(df), ncol = nc)) names(infdf) <- names(inf[[1]]) for (i in 1:nc) infdf[[i]] <- unlist(lapply(inf, function(x) x[[i]])) df <- cbind(df, infdf) } return(df) } ltraj2sldf <- function(ltr, proj4string = CRS(as.character(NA))) { if (!inherits(ltr, "ltraj")) stop("ltr should be of class ltraj") df <- ld(ltr) df <- subset(df, !is.na(dist)) coords <- data.frame(df[, c("x", "y", "dx", "dy")], id = as.numeric(row.names(df))) res <- apply(coords, 1, function(dfi) Lines(Line(matrix(c(dfi["x"], dfi["y"], dfi["x"] + dfi["dx"], dfi["y"] + dfi["dy"]), ncol = 2, byrow = TRUE)), ID = format(dfi["id"], scientific = FALSE))) res <- SpatialLinesDataFrame(SpatialLines(res, proj4string = proj4string), data = df) return(res) } ``` I load the object and apply the `ltraj2sldf` function: ```{r fail} load("tr.RData") juvStp <- ltraj2sldf(trajjuv, proj4string = CRS("+init=epsg:32617")) dim(juvStp) ```

Using knitr("test.Rmd") fails:

 label: fail Quitting from lines 66-75 (test.Rmd) Error in SpatialLinesDataFrame(SpatialLines(res, proj4string = proj4string), (from <text>#32) : row.names of data and Lines IDs do not match

Using the call directly in the R console after an error occurs works as expected ...

The problem is how format displays the identifier (in the apply ltraj2sldf call), before the identifier is 100,000: using an interactive call, R gives "99994", "99995", 99996 "," 99997 "," 99998 "," 99999 " "100000"; using knitr R gives "99994", "99995", "99996", "99997", "99998", "99999", "100000", with additional leading spaces.

Is there a reason for this behavior? Why knitr behave differently than a direct call to R? I have to admit that it’s hard for me to handle this, because I can’t debug it (it works in an interactive session)!

Any hint would be much appreciated. I can provide .RData if that helps (Mo file 4.5), but what interests me most is why such a difference occurs. I tried unsuccessfully to come up with a self-reproducing example, I regret it. Thanks in advance for any input!

After the baptiste comment, here are some details about generating identifiers. Basically, an identifier is generated on each line of the data frame by a call to apply , which in turn uses format as follows: format(dfi["id"], scientific = FALSE) . Here, the id column is simply a series from 1 to the number of rows ( 1:nrow(df) ). scientific = FALSE is only that I would not have results like 1e + 05 for 100000.

Based on the study of the generation of identifiers, the problem arises only for those that are presented in the first message, that is, from 99995 to 99999, for which the leading space is added. This should not happen with this format call, since I did not request a certain number of digits in the output. For instance:

 > format(99994:99999, scientific = FALSE) [1] "99994" "99995" "99996" "99997" "99998" "99999"

However, if identifiers are generated in chunks, this can happen:

 > format(99994:100000, scientific = FALSE) [1] " 99994" " 99995" " 99996" " 99997" " 99998" " 99999" "100000"

Note that the same processed at one time gives the expected result:

 > for (i in 99994:100000) print(format(i, scientific = FALSE)) [1] "99994" [1] "99995" [1] "99996" [1] "99997" [1] "99998" [1] "99999" [1] "100000"

In the end, it is exactly the same as if the identifiers were not prepared one at a time (as I would expect from apply in turn), but in this case 6 at a time, and only when 1e + 05 is close ... And, of course , only when using knitr, and not in interactive or batch R.

Here is my session information:

 > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] knitr_1.2 adehabitatLT_0.3.12 CircStats_0.2-4 [4] boot_1.3-9 MASS_7.3-27 adehabitatMA_0.3.6 [7] ade4_1.5-2 sp_1.0-11 basr_0.5.3 loaded via a namespace (and not attached): [1] digest_0.6.3 evaluate_0.4.4 formatR_0.8 fortunes_1.5-0 [5] grid_3.0.1 lattice_0.20-15 stringr_0.6.2 tools_3.0.1

+6

r interactive knitr

Mathieu basille Jul 25 '13 at 18:42

source share

3 answers

Both Jeff and Batista were really right! This is a parameter issue related to the digits argument. I managed to find a working minimal example (for example, in test.Rmd ):

 Simple reproducible example : df1 is a data frame of 110,000 rows, with 2 random normal variables + an `id` variable which is a series from 1 to the number of row. ```{r example} df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) ``` From this, we create a `id2` variable using `format` and `scientific = FALSE` to have results with all numbers instead of scientific notations (eg 100,000 instead of 1e+05): ```{r example-continued} df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE)) df1$id2[99990:100010] ```

It works as expected, using R interactively, resulting in:

  [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" [8] "99997" "99998" "99999" "100000" "100001" "100002" "100003" [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

However, the results differ from each other with knit :

 > library(knitr) > knit("test.Rmd") [...] ## [1] "99990" "99991" "99992" "99993" "99994" " 99995" " 99996" ## [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" ## [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

Note the extra leading spaces after 99994. The difference actually comes from the digits option, as Jeff rightly suggests: R uses 7 by default and knitr uses 4. This difference affects format output, although I don't quite understand what is going on here. R-style:

 > options(digits = 7) > format(99999, scientific = FALSE) [1] "99999"

knitr style:

 > options(digits = 4) > format(99999, scientific = FALSE) [1] " 99999"

But it should affect all numbers, and not only after 99994 (well, to be honest, I don’t even understand why he even adds leading places):

 > options(digits = 4) > format(c(1:10, 99990:100000), scientific = FALSE) [1] " 1" " 2" " 3" " 4" " 5" " 6" " 7" [8] " 8" " 9" " 10" " 99990" " 99991" " 99992" " 99993" [15] " 99994" " 99995" " 99996" " 99997" " 99998" " 99999" "100000"

From this I have no idea what is to blame: knitr , apply or format ? At least I applied a workaround using the trim = TRUE argument in format . It does not solve the cause of the problem, but deleted the leading space in the results ...

+3

Mathieu basille Jul 26 '13 at 3:01

source share

I added a comment to your knit GitHub issue with this information.

format() adds extra spaces if the digits parameter is not enough to display the value, but scientific=FALSE also specified. knitr installs digits in 4 internal blocks of code, which causes the behavior you described:

 options(digits=4) format(99999, scientific=FALSE)

It produces:

 [1] " 99999"

While:

 options(digits=5) format(99999, scientific=FALSE)

It produces:

 [1] "99999"

+2

Jeff johnston Jul 26 '13 at 2:28

source share

Mathieu basille · Accepted Answer · 2013-08-27T03:27:47+0000

Thanks to Alexei Vorone and Duncan Murdoch, this bug is now fixed in R-devel!

See: https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15411

Knitr vs. R Interactive Behavior

More articles: