Best way to store and use a large text file in python

I am creating a network server for boggle-clone, which I wrote in python, which accepts users, solves boards, and evaluates player input. The dictionary file I use is 1.8 MB (ENABLE2K dictionary), and I need it to be available for several classes of the game resolver. Right now, I have it so that each class iterates through the file line by line and generates a hash table (associative array), but the more decisive classes I create, the more memory it takes.

What I would like to do is import the dictionary file once and pass it to each instance of the solver as needed. But what is the best way to do this? Should I import the dictionary into global space, then access it in the solver class as globals () ['dictionary']? Or do I need to import a dictionary and then pass it as an argument to the class constructor? Is one of them better than the other? Is there a third option?

+4
source share
4 answers

If you create a dictionary.py module containing code that reads a file and creates a dictionary, this code will be executed only on the first import. Further import will return a link to an existing instance of the module. Thus, your classes can:

import dictionary dictionary.words[whatever] 

where dictionary.py has:

 words = {} # read file and add to 'words' 
+10
source

Despite the fact that at the moment it is essentially a singleton, the usual arguments against global variables apply. For the python singleton substitute, find the "borg" object.

This is really the only difference. When a dictionary object is created, you anchor new links only when it is passed, unless you explicitly perform a deep copy. It makes sense that it is centered once and only once, until each instance of solver requires a private copy for modification.

+1
source

Adam, remember that in Python, when you say:

 a = read_dict_from_file() b = a 

... you are not actually copying a and thus, using more memory, you just make b another reference to the same object.

Thus, basically, any of the solutions you propose will be much better in terms of memory usage. Basically, read the dictionary once , and then hold onto the link to it. Regardless of whether you are doing this with a global variable or passing it to each instance or something else, you will reference the same object and not duplicate it.

Which one is the most pythons? This is a whole "black worm of worms", but here is what I would do personally:

 def main(args): run_initialization_stuff() dictionary = read_dictionary_from_file() solvers = [ Solver(class=x, dictionary=dictionary) for x in len(number_of_solvers) ] 

NTN.

+1
source

Depending on what your dict contains, you may be interested in the shelve or anydbm modules. They give you dict type interfaces (only strings as keys and elements for "anydbm", and strings as keys and any python object as elements for "shelve"), but the data is actually in the DBM file (gdbm, ndbm, dbhash, bsddb, depending on what is available on the platform.) You probably still want to share the actual database between the classes as you ask, but this avoids the parsing step and the text file, and also saves an all-in-one bit of memory.

0
source

Source: https://habr.com/ru/post/1277367/


All Articles