This is a continuation of the answer provided here using sqldf()
stack overflow
In my particular case, I have a tab delimited file with over 110 million lines. I would like to select the strings corresponding to 4.6 million tag identifiers.
In the following code, tag identifiers are in tag.query
However, although the example will work with a smaller query, it does not handle the above example:
sql.query <- paste('select * from f where v2 in (', tag.query, ')', sep='') selected.df <- sqldf(sql.query, dbname = tempfile(), file.format = list(header = F, row.names = F, sep="\t", skip=line.where.header.is))
Any suggestions for alternative ratings?
source share