Here's a big (ger) dataset
big = iris[rep(seq_len(nrow(iris)), 1000),]
a for loop with toJSON(df[i,]) creates a flat file of key-value pairs representing each row, while toJSON(df) creates column vectors; they are very different. We strive for the equivalent toJSON(df[i,]) , but format it as a single JSON string.
Start with munging big in the lol list-lists with each inner element named (turning the factor into a character so as not to confuse json further), so lol looks like list(big[1,], big[2,], ...) > but with names for each element.
big1 <- Map(function(x, nm) setNames(x, rep(nm, length(x))), big, names(big)) big1$Species <- as.character(big1$Species) lol <- unname(do.call(Map, c(list, big1)))
Then we turn it into a json vector using the rjson and splitIndices library provided by the parallel library (possibly other ways to generate splitting)
chunks <- 10 json <- sapply(splitIndices(length(lol), chunks), function(idx) toJSON(lol[idx]))
We could write json chunks to a file, but they are not quite legal - everything except the last line should end with "," but end with "]", and everything except the first should start with nothing, but start instead with "[".
substring(json[-length(json)], nchar(json)[-length(json)]) = "," substring(json[-1], 1, 1) = ""
Then they are ready to write to the file.
fl <- tempfile() writeLines(json, fl)
The union and, of course, many special cases for forced use of the column type are not processed,
library(parallel) ## just for splitIndices; no parallel processing here... library(json) fastJson <- function(df, fl, chunks=10) { df1 = Map(function(x, nm) setNames(x, rep(nm, length(x))), df, names(df)) df1 <- lapply(df1, function(x) { if (is(x, "factor")) as.character(x) else x }) lol = unname(do.call(Map, c(list, df1))) idx <- splitIndices(length(lol), chunks) json <- sapply(idx, function(i) toJSON(lol[i])) substring(json[-length(json)], nchar(json)[-length(json)]) <- "," substring(json[-1], 1, 1) <- "" writeLines(json, fl) }
WITH
> fastJson(big, tempfile()) > system.time(fastJson(big, fl <- tempfile())) user system elapsed 2.340 0.008 2.352 > system(sprintf("wc %s", fl)) 10 10 14458011 /tmp/RtmpjLEh5h/file3fa75d00a57c
In contrast, simply substituting large ones (without any JSON parsing or writing to a file) takes a lot of time:
> system.time(for (i in seq_len(nrow(big))) big[i,]) user system elapsed 57.632 0.088 57.835
Opening this file for adding, once for each line, does not take much time compared to the setting
> system.time(for (i in seq_len(nrow(big))) { con <- file(fl, "a"); close(con) }) user system elapsed 2.320 0.580 2.919