Improving query performance

I need to read and combine many rows (~ 500k) from a PostgreSQL database and write them to a MySQL database.

My naive approach looks like this

entrys = Entry.query.yield_per(500) for entry in entrys: for location in entry.locations: mysql_location = MySQLLocation(entry.url) mysql_location.id = location.id mysql_location.entry_id = entry.id [...] mysql_location.city = location.city.name mysql_location.county = location.county.name mysql_location.state = location.state.name mysql_location.country = location.country.name db.session.add(mysql_location) db.session.commit() 

Each Entry has 1 to 100 Locations .

This script is working now for about 20 hours and already consumes> 4 GB of memory, since everything that is stored in memory before the session is fixed.

With my attempt to execute earlier, I am having problems like this .

How to improve query performance? This needs to be done much faster, since the number of rows in the coming months will increase to about 2500 thousand.

+4
source share
1 answer

Your naive approach is wrong for the reason that you already know - the material that is in your memory is model objects that hang in your memory, waiting to be dumped in mysql.

The easiest way is not to use ORM for conversions at all. Use SQLAlchemy table objects directly, as they are also much faster.

In addition, what you can do is create 2 sessions and bind 2 engines to separate sessions! Then you can execute a mysql session for each batch.

+1
source

Source: https://habr.com/ru/post/1494932/


All Articles