I need to maintain a large list of python pickleable objects. The list is too large to be stored in RAM, so the \ paging database mechanism is required. I need the mechanism to support quick access for nearby (nearby) areas in the list.
All python list functions should be implemented in the list, but most of the time I will work sequentially: scan a certain range in the list and during the scan decide whether I want to insert \ pop some nodes at the scan point.
The list can be very large (2-3 GB) and should not be fully included in RAM immediately. Nodes are small (100-200 bytes), but may contain various types of data.
A good solution to this would be to use BTree, where only the last available buckets are loaded into RAM.
Using SQL tables is not very good, since I need to implement a complex index key mechanism. My data is not a table, its a simple python list, with the ability to add items to specific indexes and popping items from specific positions.
I tried ZODB and zc.blist , which implement a BTree-based list, which can be saved in the ZODB database file, but I don’t know how to configure it so that the above functions are executed in a reasonable amount of time. I do not need all multi-threaded \ transactional functions. No one else touches the database file except my single-threaded program.
- , ZODB\zc.blist, ?
- , :
import time
import random
NODE_JUMP = 50000
NODE_ACCESS = 10000
print 'STARTING'
random_bytes = open('/dev/urandom', 'rb')
my_list = list()
nodes_no = 0
while True:
nodes_no += NODE_JUMP
start = time.time()
my_list.extend(random_bytes.read(100) for i in xrange(NODE_JUMP))
print 'extending to %s nodes took %.2f seconds' % (nodes_no, time.time() - start)
section_start = random.randint(0, nodes_no -NODE_ACCESS -1)
start = time.time()
for index in xrange(section_start, section_start + NODE_ACCESS):
my_list[index] = my_list[index][1:] + my_list[index][0]
print 'access to %s nodes took %.2f seconds' % (NODE_ACCESS, time.time() - start,)
:
extending to 5000000 nodes took 3.49 seconds
access to 10000 nodes took 0.02 seconds
extending to 5050000 nodes took 3.98 seconds
access to 10000 nodes took 0.01 seconds
extending to 5100000 nodes took 2.54 seconds
access to 10000 nodes took 0.01 seconds
extending to 5150000 nodes took 2.19 seconds
access to 10000 nodes took 0.11 seconds
extending to 5200000 nodes took 2.49 seconds
access to 10000 nodes took 0.01 seconds
extending to 5250000 nodes took 3.13 seconds
access to 10000 nodes took 0.05 seconds
Killed (not by me)