I have a Pandas DataFrame (called df ) that I would like to load into the MySql database. The DataFrame has the columns [A, B, C], and the table in the database has the columns [ID, A, B, C]. The identifier column in the database is automatically incrementing the primary key.
I can load the data frame into the database using the df.to_sql('table_name', engine) command. However, this does not give me any information about the values ββthat the database assigned to the input identifier column. The only way to get this information is to query the database using the values ββfor columns A, B, C:
select ID, A, B, C from db_table where (A, B, C) in ((x1, y1, z1), (x2, y2, z2), ...)
However, this query takes a very long time when I insert a lot of data.
Is there an easier and faster way to get the values ββthat the database has assigned to the input identifier column?
Edit 1: I can assign the identifier column myself, according to user response 3364098 below. However, my work is part of a pipeline that runs in parallel. If I assign an identifier column myself, there is a chance that I can assign the same id values ββto different data frames that are loaded at the same time. That is why I would like to redefine the task of assigning an identifier to a database.
Solution: In the end, I assigned an identifier column and issued a lock in the table when loading data to ensure that no other process loads data with the same id value. Mostly:
try: engine.execute('lock tables `table_name` write') max_id_query = 'select max(ID) FROM `table_name`' max_id = int(pd.read_sql_query(max_id_query, engine).values) df['ID'] = range(max_id + 1, max_id + len(df) + 1) df.to_sql('table_name', engine, if_exists='append', index=False) finally: engine.execute('unlock tables')