Flask: using a global variable to load data files into memory

I have a large XML file that opens, loads into memory, and then closes with the Python class. A simplified example would look like this:

class Dictionary(): def __init__(self, filename): f = open(filename) self.contents = f.readlines() f.close() def getDefinitionForWord(self, word): # returns a word, using etree parser 

And in my Flask application:

 from dictionary import Dictionary dictionary = Dictionary('dictionary.xml') print 'dictionary object created' @app.route('/') def home(): word = dictionary.getDefinitionForWord('help') 

I understand that in an ideal world, I would use a database instead of XML and create a new connection to this database for each request.

In the documents, I realized that the application context in the jar means that each request will recreate dictionary = new Dictionary('dictionary.xml') by opening the file on disk and re-reading all this into memory. However, when I look at the debug output, I see a dictionary object created line printed exactly once, despite being connected from several sources (different sessions?).

My first question is:

It seems to me that the application only downloads the XML file once ... Then I can assume that it is in memory all over the world and can be safely read by a large number of simultaneous requests, limited only by RAM on my server - right? If XML is 50 MB, then approx. 50 MB in memory and serviced until simultaneous requests at high speed ... I assume that it is not so simple.

And my second question:

If this is not the case, what limits am I going to hit with my ability to handle large volumes of traffic? How many requests can I process if I have 50 MB of XML that opens multiple times, reads from disk, and closes? I guess one at a time.

I understand that this is vague and hardware dependent, but I'm new to Flask, python and programming for the web, and just looking for guidance.

Thanks!

+6
source share
1 answer

It is safe to maintain it this way until the global object is modified. This is a WSGI function, as described in the Werkzeug 1 docs (the library in which Flask is built on top).

This data will be stored in the memory of each WSGI application server workflow. This does not mean once, but the number of processes (workers) is small and constant (does not depend on the number of sessions or traffic).

Thus, it can be saved in this way.

However, I would use a suitable database in your place. If you have 16 workers, your data will occupy at least 800 MB of RAM (the number of workers is usually twice the number of processors). If XML is growing and you finally decide to use the database service, you will need to rewrite the code.

If the reason for saving memory is because PostgreSQL and MySQL are too slow, you can use SQLite stored in a file system in memory, such as RAMFS TMPFS. It gives you speed, an SQL interface, and you probably save on RAM. Migrating to PostgreSQL or MySQL would be much simpler (in terms of code).

+8
source

Source: https://habr.com/ru/post/981524/


All Articles