I also ran into this problem when I parsed ASCII data files to import data into a table. The problem is that I instinctively and intuitively wanted SQLAlchemy to skip duplicate rows, allowing unique data. Or it may be the case when a random error is generated using a string due to the current SQL mechanism, for example, unicode strings are not allowed.
However, this behavior is beyond the scope of the SQL interface definition. The SQL API, and therefore SQLAlchemy, understands transactions and commits transactions and does not account for this selective behavior. In addition, it sounds dangerous to depend on the autodiscover function, since insertion stops after an exception, leaving the rest of the data.
My solution (which I'm not sure if it is the most elegant) is to process each line in loops, catch and log exceptions and commit the changes at the very end.
Assuming you somehow acquired data in a list of lists, i.e. a list of rows that are lists of column values. Then you read each line in a loop:
# Python 3.5 from sqlalchemy import Table, create_engine import logging # Create the engine # Create the table # Parse the data file and save data in `rows` conn = engine.connect() trans = conn.begin() # Disables autocommit exceptions = {} totalRows = 0 importedRows = 0 ins = table.insert() for currentRowIdx, cols in enumerate(rows): try: conn.execute(ins.values(cols)) # try to insert the column values importedRows += 1 except Exception as e: exc_name = type(e).__name__ # save the exception name if not exc_name in exceptions: exceptions[exc_name] = [] exceptions[exc_name].append(currentRowIdx) totalRows += 1 for key, val in exceptions.items(): logging.warning("%d out of %d lines were not imported due to %s."%(len(val), totalRows, key)) logging.info("%d rows were imported."%(importedRows)) trans.commit() # Commit at the very end conn.close()
To maximize speed in this operation, you must turn off auto-messaging. I use this code with SQLite, and it is still 3-5 times slower than my old version, using only sqlite3
, even if automatic protection is disabled. (The reason I ported SQLAlchemy was to use it with MySQL.)
This is not the most elegant solution in the sense that it is not as fast as the direct interface with SQLite. If I profile the code and find a bottleneck in the near future, I will update this answer with a solution.