Suppose I have a data frame with many columns
ncol = 40
sample_size = 300
my_matrix <- replicate(ncol, runif(sample_size, 0, 3))
my_df <- data.frame(my_matrix)
names(my_df) <- paste0("x", 1:ncol)
epsilon <- rnorm(sample_size, 0, 0.2)
my_df$y <- 1+3*my_df$x1 + epsilon
I pass a data frame to a function that requires only three of its columns (in my real codes, a function can use more than three columns, but I try to make everything simple here):
library(ggplot2)
idle_plotter <- function(dataframe, x_string, y_string, color_string){
p <- ggplot(dataframe, aes_string(x = x_string, y = y_string, color = color_string)) +
geom_point()
print(p)
}
Does it matter in terms of speed if I pass all my_dfin idle_plotteror only need three columns idle_plotter? If the entire data frame is copied during the call, I assume that it is, but if R is a pass by reference, it should not. In my tests, this does not seem to matter, but I need to know if:
- This rule, in this case I can continue to pass data frames to functions
- , / . , , .