I need to read and combine many rows (~ 500k) from a PostgreSQL database and write them to a MySQL database.
My naive approach looks like this
entrys = Entry.query.yield_per(500) for entry in entrys: for location in entry.locations: mysql_location = MySQLLocation(entry.url) mysql_location.id = location.id mysql_location.entry_id = entry.id [...] mysql_location.city = location.city.name mysql_location.county = location.county.name mysql_location.state = location.state.name mysql_location.country = location.country.name db.session.add(mysql_location) db.session.commit()
Each Entry has 1 to 100 Locations .
This script is working now for about 20 hours and already consumes> 4 GB of memory, since everything that is stored in memory before the session is fixed.
With my attempt to execute earlier, I am having problems like this .
How to improve query performance? This needs to be done much faster, since the number of rows in the coming months will increase to about 2500 thousand.
source share