Why can't you retrieve one record at a time from the database, process it as needed, convert it to JSON, and then emit it with a comma / delimiter point?
If you started with a file containing only [ , you added all your JSON lines, and then did not add a comma to the final record and used closure instead ] , you would have JSON hashes, and you would only need to process one line at a time.
It will be a little slower (maybe), but will not affect your system. And DB I / O can be very fast if you use lock / paging to get a reasonable number of records at a time.
For example, here is a combination of some Sequel example code and code to extract strings as JSON and create a larger JSON structure:
require 'json' require 'sequel' DB = Sequel.sqlite
What outputs:
[ {"id":2,"name":"def","price":3.714714089426208}, {"id":3,"name":"ghi","price":27.0179624376119}, {"id":1,"name":"abc","price":52.51248221170203} ]
Please note that the order is now at a "price".
Validation is simple:
require 'json' require 'pp' pp JSON[<<EOT] [ {"id":2,"name":"def","price":3.714714089426208}, {"id":3,"name":"ghi","price":27.0179624376119}, {"id":1,"name":"abc","price":52.51248221170203} ] EOT
Result:
[{"id"=>2, "name"=>"def", "price"=>3.714714089426208}, {"id"=>3, "name"=>"ghi", "price"=>27.0179624376119}, {"id"=>1, "name"=>"abc", "price"=>52.51248221170203}]
This confirms JSON and demonstrates that the original data can be restored. Each row retrieved from the database should be the smallest βbitβ part of the overall JSON structure you want to build.
Based on this, here's how to read incoming JSON in a database, manipulate it, then emit it as a JSON file:
require 'json' require 'sequel' DB = Sequel.sqlite
What generates:
[ {"name":"abc","price":3.268814929005337,"foo":"bar","time":1379688093.124606}, {"name":"def","price":13.871147312377719,"foo":"bar","time":1379688093.124664}, {"name":"ghi","price":52.720984131655676,"foo":"bar","time":1379688093.124702}, {"name":"jkl","price":53.21477190840114,"foo":"bar","time":1379688093.124732}, {"name":"mno","price":40.99364022416619,"foo":"bar","time":1379688093.124758}, {"name":"pqr","price":5.918738444452265,"foo":"bar","time":1379688093.124803}, {"name":"stu","price":45.09391752439902,"foo":"bar","time":1379688093.124831}, {"name":"vwx","price":63.08947792357426,"foo":"bar","time":1379688093.124862}, {"name":"yz_","price":94.04921035056373,"foo":"bar","time":1379688093.124894} ]
I added a timestamp so that you can see that each line is processed individually, and to give you an idea of ββhow quickly the lines are processed. Of course, this is a tiny database with memory that does not have network I / O for content, but a normal network connection through the transition to the database on a reasonable database node should also be quite fast. Telling ORM to read the database in chunks can speed up processing, because DBM will be able to return larger blocks to fill packages more efficiently. You will need to experiment to determine what size chunks you need, because they will depend on your network, your hosts and the size of your records.
Your original design is not suitable when working with corporate-sized databases, especially when your hardware resources are limited. Over the years, we have learned how to analyze BIG databases that make 20,000 row tables minimal. Nowadays, pieces of VM are common, and we use them for crunching, so they are often the PCs of yesteryear: one processor with small memory prints and hard drives. We cannot defeat them, or they will be bottlenecks, so we must break the data into the smallest atomic parts that we can.
Capturing DB Design: Storing JSON in a database is a dubious practice. DBM these days can call JSON, YAML and XML string representations, but forcing DBM to search inside stored JSON, YAML or XML strings is a major hit in processing speed, so avoid it at all costs if you also don't have equivalent search data indexed in separate fields, so your search queries are at the highest possible speed. If the data is available in separate fields, then making good database queries, customizing in DBM or your custom scripting language and issuing massive data becomes much easier.