Is there a specific way to handle timestamp columns in R when retrieving data using RPostgreSQL?

I am trying to pull data from a PostgreSQL database and the results for the timestamp field are incompatible. I am not sure if I process POSIXct results correctly. Otherwise, I think I found an error in the RPostgreSQL package. Here is a way to replicate the problem:

Suppose there is a table with one field in the postgres database (run this in PostgreSQL):

CREATE DATABASE mydb; CREATE TABLE test_table ( "DateTime" timestamp without time zone NOT NULL, CONSTRAINT "pk_test_table" PRIMARY KEY ("DateTime") ) WITH ( OIDS=FALSE ); ALTER TABLE test_table OWNER TO postgres; 

And let's say that there are several hundred entries. I will write them in R. Here is the code:

 library(RPostgreSQL) # Let feed the table with some sequence of date/time values date_values <- as.chron(seq(10000, 10500, 1/24)) format.chron <- function(z) { sprintf("%04.0f-%02.0f-%02.0f %02.0f:%02.0f:00", as.numeric(as.character(years(z))), months(z), as.numeric(as.character(days(z))), as.numeric(as.character(hours(z))), as.numeric(as.character(minutes(z)))) } .generateInsertQuery <- function(date_values, field_name, table_name) { insert_val <- paste(paste0("(", sQuote(format(date_values)), ")"), collapse=',') qry <- paste("INSERT INTO", dQuote(table_name), paste0("(", dQuote(field_name), ")"), "VALUES", insert_val) qry } drv <- dbDriver('PostgreSQL') con <- dbConnect(drv, user='postgres', dbname='mydb') qry <- .generateInsertQuery(date_values, "DateTime", "test_table") dbSendQuery(con, qry) 

If I try to get the values, the time component will be removed from the resulting data

 res <- dbGetQuery(con, "SELECT * FROM test_table") res[1:20,1] 

The result class, however, is POSIXct.

 class(res[,1]) 

If the result is selected one record at a time, values ​​with an hour: min equal to 00:00 lose the time component:

 rs <- dbSendQuery(con, "SELECT \"DateTime\" FROM test_table") res_list <- list() for(i in 1:100) res_list[i] <- fetch(rs,1) res_list 

As a workaround, I get the result 1 at a time, capturing and aggregating them in data.frame. But this is very time consuming, especially for large data sets. Any ideas on why this is happening and how to deal with this problem?

Thanks in advance!

+6
source share
2 answers

First, the RPostgreSQL project has a mailing list; I suggest you post there.

PostgreSQL has two types of datetime: with and without a time zone. As far as I remember, R only displays the last. I wrote some early regression tests for this (see Package Source), but have not been involved with the project recently. But I remember that POSIXct maps back and forth to the PostgreSQL date and time type.

+3
source

RPostgreSQL dbWriteTable with any posixct field will create a database field of type timestamp with timezone always with tz +00 no matter what time interval posixct will be. I believe that it would be more accurate to create a timestamp without timezone .

The best solution for dbReadTable and dbWriteTable is to use Sys.setenv(TZ = "UTC") . In my opinion, this is too deep a dependency, because many other processes in the R session may require the correct time zone setting.

Much more specific, rather than deep, is to define your own dbReadTable and dbWriteTable , which wrap the DBI version with the corresponding preprocess / postprocess of posixct types. But still, this is not an option if you are developing a DBI-compatible code / package (not only related to postgres).

It would be great if RPostgreSQL were ported to github for easier input.

+4
source

Source: https://habr.com/ru/post/949861/


All Articles