Python Shelve Module Memory Consumption

Question

Python Shelve Module Memory Consumption

I was instructed to read the .txt file, which is a log of various events and writes some of these events to a dictionary.

The problem is that the file can sometimes be larger than 3 GB. This means that the dictionary is becoming too large to fit into the main memory. Shelve seems to be a good way to solve this problem. However, since I will constantly change the dictionary, I must enable the writeback parameter. It bothers me here - the manual says that it will slow down the read / write process and use more memory, but I can’t find statistics on how speed and memory affect it.

Can anyone clarify how read and write speed and memory are so dependent that I can decide whether to use the write-back option or sacrifice some readability for code efficiency?

thanks

+6

performance python code-readability shelve tradeoff

inspectorG4dget May 24, '11 at 18:30

source share

1 answer

Michael R. Hines · Answer 1 · 2015-05-30T03:03:34+0000

For databases of this size, a shelf is indeed the wrong tool. If you don’t need a highly accessible client / server architecture, and you just want to convert your TXT file to a local memory accessible database, you really need to use ZODB

If you need something highly accessible, you will of course have to switch to the official NoSQL database, from which there are many possibilities.

Here is a simple example of how to convert your shelf database into a ZODB database that will solve your memory usage / performance problems.

 #!/usr/bin/env python import shelve import ZODB, ZODB.FileStorage import transaction from optparse import OptionParser import os import sys import re reload(sys) sys.setdefaultencoding("utf-8") parser = OptionParser() parser.add_option("-o", "--output", dest = "out_file", default = False, help ="original shelve database filename") parser.add_option("-i", "--input", dest = "in_file", default = False, help ="new zodb database filename") parser.set_defaults() options, args = parser.parse_args() if options.in_file == False or options.out_file == False : print "Need input and output database filenames" exit(1) db = shelve.open(options.in_file, writeback=True) zstorage = ZODB.FileStorage.FileStorage(options.out_file) zdb = ZODB.DB(zstorage) zconnection = zdb.open() newdb = zconnection.root() for key, value in db.iteritems() : print "Copying key: " + str(key) newdb[key] = value transaction.commit()

Python Shelve Module Memory Consumption

More articles: