The fastest way to load data from PostgresSQL to R

I plan to do some data analysis with R; datasets are stored in PostgreSQL tables, and some of them contain up to 2 million records. I thought that this would not be a big problem for R, and loading records would be pretty fast, but it worked out differently.

Doing something like this may take a minute or more, which I did not expect:

library(RPostgreSQL);

drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = "mydb", user="me", password="my_password");

records <- dbGetQuery(con, statement = paste(
  "SELECT *",
  "FROM my_table",
  "ORDER BY id")); # 1.5M records

Alternative code is even slower:

records2 <- dbSendQuery(con, "select * from my_table ORDER BY id")
fetch(records2,n=-1)

I can’t say that my equipment is the most advanced in the world, but its a pretty decent MacBook Pro with 8G RAM and SSD. When I get the same data, say, QGIS, everything is done much faster.

What can be done to increase productivity in this case? Alternative libraries? Tricks and hacks? Anything else?

+4
2

ORDER BY - . .
R. , SO .

+1

, , ...

R PostgreSQL , . , :)

PL/R PostgreSQL

0

Source: https://habr.com/ru/post/1525222/


All Articles