How to get over 100,000 rows from Redshift using R and dplyr

I am analyzing data from a Redshift database while working in R using a connection for dplyr - which works:

my_db<-src_postgres(host='my-cluster-blahblah.redshift.amazonaws.com', port='5439', dbname='dev',user='me', password='mypw')
mytable <- tbl(my_db, "mytable")

viewstation<-mytable %>%
    filter(stationname=="something") 

When I try to turn this output into a data frame, like this:

thisdata<-data.frame(viewstation)

I get an error, Warning message:

Only first 100,000 results retrieved. Use n = -1 to retrieve all. 

Where should I install n?

+4
source share
2 answers

Instead of using

thisdata<-data.frame(viewstation)

using

thisdata <- collect(viewstation)

collect () will pull all the data from the database back to R. As mentioned in DPLYR :: database vignette:

When working with databases, dplyr tries to be as lazy as possible. Its lazy in two ways:

It never pulls data back to R unless you explicitly ask for it.

, , , .

+7

, dplyr 0.5 ( ).

n collect.

my_db<-src_postgres(host='my-cluster-blahblah.redshift.amazonaws.com', port='5439', dbname='dev',user='me', password='mypw')
mytable <- tbl(my_db, "mytable") %>% collect(n = Inf)

100 000 .

0

Source: https://habr.com/ru/post/1598464/


All Articles