SQL query into numeric frames inside a list

Given frame data

df1 <- data.frame(CustomerId=c(1:6),Product=c(rep("Toaster",3),rep("Radio",3))) df2 <- data.frame(CustomerId=c(2,4,6),State=c(rep("Alabama",2),rep("Ohio",1))) 

saved to list

 dflist <- c(df1,df2) 

How do I execute sqldf queries (joins) on these data frames?

Failed Attempts:

 test <- sqldf("select a.CustomerId, a.Product, b.State from dflist[1] a inner join dflist[2] b on b.id = a.id") test <- sqldf("select a.CustomerId, a.Product, b.State from dflist$df1 a inner join dflist$df2 b on b.CustomerId = a.CustomerId") 
+4
source share
1 answer

If you copy your data.frames from a list to a new environment, you can use the envir argument for sqldf or by naming the list items and using with .

Pay attention to a few things:

  • I create dflist using list not c .

note the difference

 str(c(df1,df2)) ##List of 4 ## $ CustomerId: int [1:6] 1 2 3 4 5 6 ## $ Product : Factor w/ 2 levels "Radio","Toaster": 2 2 2 1 1 1 ## $ CustomerId: num [1:3] 2 4 6 ## $ State : Factor w/ 2 levels "Alabama","Ohio": 1 1 2 str(list(df1,df2)) ##List of 2 ## $ :'data.frame': 6 obs. of 2 variables: ## ..$ CustomerId: int [1:6] 1 2 3 4 5 6 ## ..$ Product : Factor w/ 2 levels "Radio","Toaster": 2 2 2 1 1 1 ## $ :'data.frame': 3 obs. of 2 variables: ## ..$ CustomerId: num [1:3] 2 4 6 ## ..$ State : Factor w/ 2 levels "Alabama","Ohio": 1 1 2 
  • I configured sql queries to reflect the names in data.frames (according to your second approach)

named data

 dflist <- list(df1,df2) names(dflist) <- c('df1','df2') 

Create a new environment to work in

 # create a new environment e <- new.env() # assign the elements of dflist to this new environment for(.x in names(dflist)){ assign(value = dflist[[.x]], x=.x, envir = e) } # this could also be done using mapply / lapply # eg # invisible(mapply(assign, value = dflist, x = names(dflist), MoreArgs =list(envir = e))) # run the sql query sqldf("select a.CustomerId, a.Product, b.State from df1 a inner join df2 b on b.CustomerId = a.CustomerId", envir = e) ## CustomerId Product State ## 1 2 Toaster Alabama ## 2 4 Radio Alabama ## 3 6 Radio Ohio 

A simpler approach using with

you can just use with , which is evaluated locally (it is important that dflist is a named list)

 # this is far simpler!! with(dflist,sqldf("select a.CustomerId, a.Product, b.State from df1 a inner join df2 b on b.CustomerId = a.CustomerId")) 

Another simple approach using proto

  • Thanks @ G.Grothendieck (see comments

In this case, the proto package is used, which is downloaded using sqldf

 dflist <- list(a = df1, b = df2) sqldf( "select a.CustomerId, a.Product, b.State from df1 a inner join df2 b on b.CustomerId = a.CustomerId", envir = as.proto(dflist)) 

Using data.table

Or you can use data.table , which gives sql-like approaches (see FAQ 2.16 )

 library(data.table) dflist <- list(data.table(df1),data.table(df2)) names(dflist) <- c('df1','df2') invisible(lapply(dflist, setkeyv, 'CustomerId')) with(dflist, df1[df2]) ## CustomerId Product State ## 1: 2 Toaster Alabama ## 2: 4 Radio Alabama ## 3: 6 Radio Ohio 
+11
source

Source: https://habr.com/ru/post/1433295/


All Articles