If you copy your data.frames from a list to a new environment, you can use the envir argument for sqldf or by naming the list items and using with .
Pay attention to a few things:
- I create
dflist using list not c .
note the difference
str(c(df1,df2)) ##List of 4 ## $ CustomerId: int [1:6] 1 2 3 4 5 6 ## $ Product : Factor w/ 2 levels "Radio","Toaster": 2 2 2 1 1 1 ## $ CustomerId: num [1:3] 2 4 6 ## $ State : Factor w/ 2 levels "Alabama","Ohio": 1 1 2 str(list(df1,df2)) ##List of 2 ## $ :'data.frame': 6 obs. of 2 variables: ## ..$ CustomerId: int [1:6] 1 2 3 4 5 6 ## ..$ Product : Factor w/ 2 levels "Radio","Toaster": 2 2 2 1 1 1 ## $ :'data.frame': 3 obs. of 2 variables: ## ..$ CustomerId: num [1:3] 2 4 6 ## ..$ State : Factor w/ 2 levels "Alabama","Ohio": 1 1 2
- I configured sql queries to reflect the names in data.frames (according to your second approach)
named data
dflist <- list(df1,df2) names(dflist) <- c('df1','df2')
Create a new environment to work in
# create a new environment e <- new.env() # assign the elements of dflist to this new environment for(.x in names(dflist)){ assign(value = dflist[[.x]], x=.x, envir = e) } # this could also be done using mapply / lapply # eg # invisible(mapply(assign, value = dflist, x = names(dflist), MoreArgs =list(envir = e))) # run the sql query sqldf("select a.CustomerId, a.Product, b.State from df1 a inner join df2 b on b.CustomerId = a.CustomerId", envir = e) ## CustomerId Product State ## 1 2 Toaster Alabama ## 2 4 Radio Alabama ## 3 6 Radio Ohio
A simpler approach using with
you can just use with , which is evaluated locally (it is important that dflist is a named list)
Another simple approach using proto
- Thanks @ G.Grothendieck (see comments
In this case, the proto package is used, which is downloaded using sqldf
dflist <- list(a = df1, b = df2) sqldf( "select a.CustomerId, a.Product, b.State from df1 a inner join df2 b on b.CustomerId = a.CustomerId", envir = as.proto(dflist))
Using data.table
Or you can use data.table , which gives sql-like approaches (see FAQ 2.16 )
library(data.table) dflist <- list(data.table(df1),data.table(df2)) names(dflist) <- c('df1','df2') invisible(lapply(dflist, setkeyv, 'CustomerId')) with(dflist, df1[df2])