Now I study the dplyr
package in R, but hit the wall, realizing that the three functions are compute
, collect
and collapse
- do.
I understand that dplyr
does not use the data.frame
type internally; instead, it stores its data in its own type tbl
or tbl_df
.
Then, to convert the user type back to R by default, data.frame
, to use the default set of functions on data.frame
, you should use collect
, for example:
batting <- tbl(lahman_sqlite(), "Batting") dim(collect(batting))
It returns [1] 99846 22
from 2016, and dim(batting)
returns [1] NA 22
.
However, I'm not sure what the other two functions do - compute
and collapse
-. If you check it on ?collect
, the docs say the following:
Description:
'computes the computing power of lazy tbls, leaving data to a remote source. "They also compute force calculations, but will bring the data returned to R data.frame (stored in 'tbl_df)." the collapse does not force to calculate, but reduces the complex tbl to the form that additional restrictions can be added.
What does this mean, in particular, forcing to compute lazy tlbs?
UPDATE
I would like to know what these functions do, and I would like an explanation of what it does, while others do not.