Why should I get a memory error with fast_executemany on tiny df?

I was looking for ways to speed up pushing dataframe to sql server and came across an approach here. This approach blew me away in terms of speed. Using normal to_sqltook almost 2 hours, and this script was executed in 12.54 seconds to output a column of a column of 10,000 rows of X 100.

So, after testing the code below with a df sample, I tried using df, which had many different data types (int, string, floats, Booleans). However, I was sad to see a memory error. So I started to reduce the size of my df to find out what the limitations are. I noticed that if my df had lines, I could not load the sql server. I am having problems resolving the issue further. The script below is taken from the question in the link, however I added a tiny df with lines. Any suggestions on how to fix this problem would be great!

import pandas as pd
import numpy as np
import time
from sqlalchemy import create_engine, event
from urllib.parse import quote_plus
import pyodbc

conn =  "DRIVER={SQL Server};SERVER=SERVER_IP;DATABASE=DB_NAME;UID=USER_ID;PWD=PWD"
quoted = quote_plus(conn)
new_con = 'mssql+pyodbc:///?odbc_connect={}'.format(quoted)
engine = create_engine(new_con)


@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    print("FUNC call")
    if executemany:
        cursor.fast_executemany = True


table_name = 'fast_executemany_test'
df1 = pd.DataFrame({'col1':['tyrefdg','ertyreg','efdgfdg'],
                   'col2':['tydfggfdgrefdg','erdfgfdgfdgfdgtyreg','edfgfdgdfgdffdgfdg']
                   })



s = time.time()
df1.to_sql(table_name, engine, if_exists = 'replace', chunksize = None)
print(time.time() - s)
0
source share
1 answer

I was able to reproduce your problem using pyodbc 4.0.23. MemoryErrorwas related to your use of ancient

DRIVER={SQL Server}

Further testing using

DRIVER=ODBC Driver 11 for SQL Server

,

(0) (SQLParamData)

pyodbc GitHub. .

.

  • ODBC, DRIVER=ODBC Driver 13 for SQL Server,
  • pip install pyodbc==4.0.22, pyobbc.
0

Source: https://habr.com/ru/post/1691452/


All Articles