How to send data.frame from R to Q / KDB?

I have a large data.frame (15 columns and 100,000 rows) in an existing R session that I want to send to a Q / KDB instance. From the KDB cookbook possible solutions:

RServer for Q : Use KDB to create a new R instance that shares the memory space. This does not work because my data is in an existing instance from R.

RServe : start server R and use TCP / IP to communicate with the Q / KDB client. This does not work because according to the RServe documentation , "each connection has a separate workspace and working directory", and therefore I assume that I don’t see my existing data.

R Math Library : Accessing R functions through a math library without the need for an instance of R. This does not work because my data is already in an instance of R.

So, any other ideas on how to send data from R to Q / KDB?

+5
source share
1 answer

open the port in Q. I run Q with a batch file:

 @echo off c:\q\w32\q -p 5001 

download qserver.dll

 tryCatch({ dyn.load("c:/q/qserver.dll")} ,error = function(f){ print("can't load qserver.dll") }) 

Then use these

 open_connection <- function(host="localhost", port=5001, user=NULL) { parameters <- list(host, as.integer(port), user) h <- .Call("kx_r_open_connection", parameters) assign(".kh", h, envir = .GlobalEnv) return(h) } close_connection <- function(connection) { .Call("kx_r_close_connection", as.integer(connection)) } execute <- function(connection, query) { .Call("kx_r_execute", as.integer(connection), query) } d<<-open_connection(host="localhost",port=thePort) ex2 <- function(...) { query <- list(...) theResult <- NULL for(i in query) theResult <- paste0(theResult,i) return(execute(d,paste0(theResult))) } 

then ex2 can take multiple arguments so you can create queries using R variables and strings

Edit: this is for R from Q, heres R to Q

2nd Edit: improved algorithm:

 library(stringr) RToQTable <- function(Rtable,Qname,withColNames=TRUE,withRowNames=TRUE,colSuffix = NULL) { theColnames <- if(!withColNames || length(colnames(Rtable))==0) paste0("col",as.character(1:length(Rtable[1,])),colSuffix) else colnames(Rtable) if(!withRowNames || length(rownames(Rtable))==0) withRowNames <- FALSE Rtable <- rbind(Rtable,"linesep") charnum <- as.integer(nchar(thestr <- paste(paste0(theColnames,':("',str_split(paste(Rtable,collapse='";"'),';\"linesep\";\"')[[1]],');'),collapse="")) - 11) if(withRowNames) ex2(Qname,":([]",Qname,str_replace_all(paste0("`",paste(rownames(Rtable),collapse="`"))," ","_"),";",.Internal(substr(thestr,1L,charnum)),"))") else ex2(Qname,":([]",.Internal(substr(thestr,1L,charnum)),"))") } > bigMat <- matrix(runif(1500000),nrow=100000,ncol=15) > microbenchmark(RToQTable(bigMat,"Qmat"),times=3) Unit: seconds expr min lq mean median uq max neval RToQTable(bigMat, "Qmat") 10.29171 10.315 10.32766 10.33829 10.34563 10.35298 3 

This will work for the matrix, so for the data frame, just save the vector containing the types of each column, then convert the data to a matrix, import the matrix into Q and draw types

Note that this algo is approximately equal to O (rows * cols ^ 1.1), so you will need to cut the columns into multiple matrices if you have more than 20 to get O (rows * cols)

but for your example, 150,000 rows and 15 columns take 10 seconds, so further optimization may not be necessary.

+2
source

Source: https://habr.com/ru/post/1204823/


All Articles