The original question has been since 2013. Meanwhile, in February 2015, an answer was given to a duplicate or a similar question:
How to connect to PCorpus in R tm package? . This answer on this post is very important, although quite minimalist, so I will try to expand it here.
Here are some comments that I just discovered while working on a similar issue:
Note that the dbInit() function is not part of the tm package.
First you need to install the filehash package, which offers tm -Documentation only for installation. This means that this is not a tm hard dependency.
Presumably, you can also use the filehashSQLite package with library("filehashSQLite") instead of library("filehash") , and both of these packages have the same interface and work together seamlessly due to the object-oriented design. Also install "filehashSQLite" (edit 2016: some functions, such as tn :: content_transformer (), are not implemented for filehashSQLite).
then this works:
library(filehashSQLite) # this string becomes filename, must not contain dots. # Example: "mydata.sqlite" is not permitted. s <- "sqldb_pcorpus_mydata" #replace mydat with something more descriptive suppressMessages(library(filehashSQLite)) if(! file.exists(s)){ # csv is a data frame of 900 documents, 18 cols/features pc = PCorpus(DataframeSource(csv), readerControl = list(language = "en"), dbControl = list(dbName = s, dbType = "SQLite")) dbCreate(s, "SQLite") db <- dbInit(s, "SQLite") set.seed(234) # add another record, just to show we can. # key="test", value = "Hi there" dbInsert(db, "test", "hi there") } else { db <- dbInit(s, "SQLite") pc <- dbLoad(db) } show(pc) # <<PCorpus>> # Metadata: corpus specific: 0, document level (indexed): 0 #Content: documents: 900 dbFetch(db, "test") # remove it rm(db) rm(pc) #reload it db <- dbInit(s, "SQLite") pc <- dbLoad(db) # the corpus entries are now accessible, but not loaded into memory. # now 900 documents are bound via "Active Bindings", created by makeActiveBinding() from the base package show(pc) # [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" # ... # [900] #[883] "883" "884" "885" "886" "887" "888" "889" "890" "891" "892" #"893" "894" "895" "896" "897" "898" "899" "900" #[901] "test" dbFetch(db, "900") # <<PlainTextDocument>> # Metadata: 7 # Content: chars: 33 dbFetch(db, "test") #[1] "hi there"
This is what the database backend looks like. You can see that the documents from the data frame were somehow encoded inside the sqlite table.

This is what the RStudio IDE shows me: 