I plan to do some data analysis with R; datasets are stored in PostgreSQL tables, and some of them contain up to 2 million records. I thought that this would not be a big problem for R, and loading records would be pretty fast, but it worked out differently.
Doing something like this may take a minute or more, which I did not expect:
library(RPostgreSQL);
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = "mydb", user="me", password="my_password");
records <- dbGetQuery(con, statement = paste(
"SELECT *",
"FROM my_table",
"ORDER BY id"));
Alternative code is even slower:
records2 <- dbSendQuery(con, "select * from my_table ORDER BY id")
fetch(records2,n=-1)
I can’t say that my equipment is the most advanced in the world, but its a pretty decent MacBook Pro with 8G RAM and SSD. When I get the same data, say, QGIS, everything is done much faster.
What can be done to increase productivity in this case? Alternative libraries? Tricks and hacks? Anything else?