How to update 400k rows on MySQL MySQL server and not kill it

On our production server, we need to split 900 thousand images into different servers and update 400 thousand rows (MySQL with InnoDB engine). I wrote a python script that goes through the following steps:

  • Select a small piece of data from db (10 rows)
  • Make new dirs
  • Copy files to created directories and rename them
  • Update db (there are some upgrade triggers that will load the server)
  • Repeat

My code is:


import os, shutil
import database # database.py from tornado

LIMIT_START_OFFSET = 0
LIMIT_ROW_COUNT = 10
SRC_PATHS = ('/var/www/site/public/upload/images/',)
DST_PATH = '/var/www/site/public/upload/new_images/'

def main():
    offset = LIMIT_START_OFFSET
    while True:
        db = Connection(DB_HOST, DB_NAME, DB_USER, DB_PASSWD)
        db_data = db.query('''
            SELECT id AS news_id, image AS src_filename
            FROM emd_news
            ORDER BY id ASC
            LIMIT %s, %s''', offset, LIMIT_ROW_COUNT)
        offset = offset + LIMIT_ROW_COUNT
        news_images = get_news_images(db_data) # convert data to easy-to-use list
        make_dst_dirs(DST_PATH, [i['dst_dirname'] for i in news_images]) # make news dirs
        news_to_update = copy_news_images(SRC_PATHS, DST_PATH, news_images) # list of moved files
        db.executemany('''
            UPDATE emd_news
            SET image = %s
            WHERE id = %s
            LIMIT 1''', [(i['filename'], i['news_id']) for i in news_to_update])
        db.close()
        if not db_data: break

if __name__ == '__main__':
    main()

Pretty simple task, but I'm a little nervous about performance.

How can I make this script more efficient?

UPD: In the end, I used the original script without any changes. It took about 5 hours. And it was fast at the beginning and very slow at the end.

+3
3

i :

heh!!!

, , db.autocommit(False) UPDATE db.commit() 100 - ;

, Alin Purcaru, .

, :)

+3

.

  • isProcessed .
  • script , , 1k (, , ).
  • .
  • .
  • script, .

!

, ( , ). , , script.

+2
    db_data = db.query('''
        SELECT id AS news_id, image AS src_filename
        FROM emd_news
        ORDER BY id ASC
        LIMIT %s, %s''', offset, LIMIT_ROW_COUNT)
     # Why is there any code here at all?  If there no data, why proceed?
     if not db_data: break
+1
source

Source: https://habr.com/ru/post/1770253/


All Articles