Progress bar for pandas.DataFrame.to_sql

I want to transfer data from a large csv file to sqlite3 database.

My Python 3.5 code using pandas:

con = sqlite3.connect(DB_FILENAME)
df = pd.read_csv(MLS_FULLPATH)
df.to_sql(con=con, name="MLS", if_exists="replace", index=False)

Is it possible to print the current status (progress bar) of the execution of the to_sql method?

I looked through an article about tqdm but did not find how to do this.

+4
source share
1 answer

Unfortunately, it DataFrame.to_sqldoes not provide the chunk-by-chunk callback that tqdm needs to update its status. However, you can process the dataframe block with a piece:

import sqlite3
import pandas as pd
from tqdm import tqdm

DB_FILENAME='/tmp/test.sqlite'

def chunker(seq, size):
    # from http://stackoverflow.com/a/434328
    return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))

def insert_with_progress(df, dbfile):
    con = sqlite3.connect(dbfile)
    chunksize = int(len(df) / 10) # 10%
    with tqdm(total=len(df)) as pbar:
        for i, cdf in enumerate(chunker(df, chunksize)):
            replace = "replace" if i == 0 else "append"
            cdf.to_sql(con=con, name="MLS", if_exists=replace, index=False)
            pbar.update(chunksize)

df = pd.DataFrame({'a': range(0,100000)})
insert_with_progress(df, DB_FILENAME)

. DataFrame , .

:

enter image description here

+7

Source: https://habr.com/ru/post/1654595/


All Articles