How to convert pandas framework for insertion using executemany () statement?

Question

How to convert pandas framework for insertion using executemany () statement?

I have a rather large pandas dataframe - 50 or so headers and several hundred thousand rows of data - and I want to transfer this data to the database using the ceODBC module. I used to use pyodbc and used a simple execute statement in a for loop, but it ridiculously lasted (1000 entries in 10 minutes) ...

Now I'm trying to create a new module and trying to enter executemany() , although I'm not quite sure what is meant by a sequence of parameters in:

  cursor.executemany("""insert into table.name(a, b, c, d, e, f) values(?, ?, ?, ?, ?), sequence_of_parameters)

should it look like a permanent list running through each title, e.g.

  ['asdas', '1', '2014-12-01', 'true', 'asdasd', 'asdas', '2', '2014-12-02', 'true', 'asfasd', 'asdfs', '3', '2014-12-03', 'false', 'asdasd']

where is a three line example

or what format is needed?

as another related question, how can I go about converting a regular pandas frame to this format?

Thanks!

+6

python database pandas executemany

Colin o'brien Apr 29 '15 at 8:44

source share

3 answers

You can try the following:

 cursor.executemany(sql_str, your_dataframe.values.tolist())

Hope this helps.

+5

ansen May 12, '15 at 8:37

source share

It may be a little late to answer this question, but perhaps it can help someone. executemany() not implemented by many ODBCs. One of them matters MySQL . When they refer to a sequence of parameters, they mean:

 parameters=[{'name':'Jorge', 'age':22, 'sex':'M'}, {'name':'Karen', 'age':25, 'sex':'F'}, {'name':'James', 'age':29, 'sex':'M'}]

and for the query operator, it looks something like this:

 SQL = INSERT IGNORE INTO WORKERS (NAME, AGE, SEX) VALUES (%(name)s, %(age)s, %(sex)s)

Which looks like you're there. A few things, although I want to point out in case this helps: pandas has a to_sql function that inserts into db if you provide it with a connector object and also blocks the data.

To quickly create a sequence of parameters from a pandas frame, I found the following two methods useful:

 # creates list of dict, list of parameters # REF: https://groups.google.com/forum/#!topic/pydata/qna3Z3WmVpM parameters = [df.iloc[line, :].to_dict() for line in range(len(df))] # Cleaner Way parameters = df.to_dict(orient='records')

+2

Victor Uriarte Aug 9 '15 at 17:20

source share

Colin o'brien · Accepted Answer · 2015-05-13T11:09:04+0000

I managed to figure it out at the end. Therefore, if you have a Pandas Dataframe that you want to write to the database using ceODBC , which is the module I used, the code:

(with all_data as a dataframe) maps the dataframe values to a string and saves each row as a tuple in the tuple list

 for r in all_data.columns.values: all_data[r] = all_data[r].map(str) all_data[r] = all_data[r].map(str.strip) tuples = [tuple(x) for x in all_data.values]

for a list of tuples, change all the null denominators that were written as strings in the conversion above to a null type that can be passed to the destination database. This was a problem for me, maybe not for you.

 string_list = ['NaT', 'nan', 'NaN', 'None'] def remove_wrong_nulls(x): for r in range(len(x)): for i,e in enumerate(tuples): for j,k in enumerate(e): if k == x[r]: temp=list(tuples[i]) temp[j]=None tuples[i]=tuple(temp) remove_wrong_nulls(string_list)

create database connection

 cnxn=ceODBC.connect('DRIVER={SOMEODBCDRIVER};DBCName=XXXXXXXXXXX;UID=XXXXXXX;PWD=XXXXXXX;QUIETMODE=YES;', autocommit=False) cursor = cnxn.cursor()

Define a function to turn the list of tuples into new_list , which is an additional indexing in the list of tuples, into pieces 1000. I needed to transfer data to a database whose SQL Query failed to exceed 1 MB.

 def chunks(l, n): n = max(1, n) return [l[i:i + n] for i in range(0, len(l), n)] new_list = chunks(tuples, 1000)

Define your request.

 query = """insert into XXXXXXXXXXXX("XXXXXXXXXX", "XXXXXXXXX", "XXXXXXXXXXX") values(?,?,?)"""

Run new_list containing a list of tuples in groups of 1000 and execute executemany . Follow this, committing and closing the connection, and what it is :)

 for i in range(len(new_list)): cursor.executemany(query, new_list[i]) cnxn.commit() cnxn.close()

How to convert pandas framework for insertion using executemany () statement?

More articles: