How to get auto-increment values for a column after loading a Pandas data frame into a MySQL database

Question

How to get auto-increment values for a column after loading a Pandas data frame into a MySQL database

I have a Pandas DataFrame (called df ) that I would like to load into the MySql database. The DataFrame has the columns [A, B, C], and the table in the database has the columns [ID, A, B, C]. The identifier column in the database is automatically incrementing the primary key.

I can load the data frame into the database using the df.to_sql('table_name', engine) command. However, this does not give me any information about the values that the database assigned to the input identifier column. The only way to get this information is to query the database using the values for columns A, B, C:

 select ID, A, B, C from db_table where (A, B, C) in ((x1, y1, z1), (x2, y2, z2), ...)

However, this query takes a very long time when I insert a lot of data.

Is there an easier and faster way to get the values that the database has assigned to the input identifier column?

Edit 1: I can assign the identifier column myself, according to user response 3364098 below. However, my work is part of a pipeline that runs in parallel. If I assign an identifier column myself, there is a chance that I can assign the same id values to different data frames that are loaded at the same time. That is why I would like to redefine the task of assigning an identifier to a database.

Solution: In the end, I assigned an identifier column and issued a lock in the table when loading data to ensure that no other process loads data with the same id value. Mostly:

 try: engine.execute('lock tables `table_name` write') max_id_query = 'select max(ID) FROM `table_name`' max_id = int(pd.read_sql_query(max_id_query, engine).values) df['ID'] = range(max_id + 1, max_id + len(df) + 1) df.to_sql('table_name', engine, if_exists='append', index=False) finally: engine.execute('unlock tables')

+2

python pandas mysql sqlalchemy

ostrokach Nov 06 '14 at 1:55

source share

2 answers

 import pandas as pd df['ID'] = pd.read_sql_query('select MAX(ID)+1 from db_table',cnx).iloc[0,0] + range(len(df))

-1

trader Aug 08 '16 at 8:04

source share

hvedrung · Accepted Answer · 2014-11-06T13:08:05+0000

You can assign an identifier yourself:

 import pandas as pd df['ID'] = pd.read_sql_query('select ifnull(max(id),0)+1 from db_table',cnx).iloc[0,0]+range(len(df))

where cnx is your connection and then load your df.

How to get auto-increment values ​​for a column after loading a Pandas data frame into a MySQL database

More articles:

How to get auto-increment values for a column after loading a Pandas data frame into a MySQL database