I may not understand the problem, but for me there is a 4 GB database table that is easily accessible to users should not be too big a problem. Is there something wrong just loading the data at a time as you described? 4GB is now not too much RAM.
Personally, I would recommend that you simply use the database system, instead of loading stuff into memory and crunching with python. If you set up the data correctly, you can query many thousands of queries in seconds. Pandas is actually written to simulate SQL, so most of the code you use can probably be translated directly into SQL. More recently, I had a situation at work, where I organized a large operation for combining to take a couple of hundred files (~ 4 GB in total, 600 thousand lines per file) using pandas. The total lead time was 72 hours or something else, and it was an operation that should be performed once an hour. The employee finished rewriting the same python / pandas code as a fairly simple SQL query that completed in 5 minutes instead of 72 hours.
In any case, I would recommend that you store your Pandas framework in the actual database table. Django is built on a database (usually mySQL or Postgres), and Pandas has the function of directly inserting your data frame into the dataframe.to_sql( 'database_connection_str' ) database dataframe.to_sql( 'database_connection_str' ) ! From there, you can write django code so that the answers give a single request to the database, select values ββin a timely manner and return data.
source share