Now I study the dplyr package in R, but hit the wall, realizing that the three functions are compute , collect and collapse - do.
I understand that dplyr does not use the data.frame type internally; instead, it stores its data in its own type tbl or tbl_df .
Then, to convert the user type back to R by default, data.frame , to use the default set of functions on data.frame , you should use collect , for example:
batting <- tbl(lahman_sqlite(), "Batting") dim(collect(batting))
It returns [1] 99846 22 from 2016, and dim(batting) returns [1] NA 22 .
However, I'm not sure what the other two functions do - compute and collapse -. If you check it on ?collect , the docs say the following:
Description:
'computes the computing power of lazy tbls, leaving data to a remote source. "They also compute force calculations, but will bring the data returned to R data.frame (stored in 'tbl_df)." the collapse does not force to calculate, but reduces the complex tbl to the form that additional restrictions can be added.
What does this mean, in particular, forcing to compute lazy tlbs?
UPDATE
I would like to know what these functions do, and I would like an explanation of what it does, while others do not.