Query Sqlite for multiple arguments simultaneously and handling missing values

Question

Query Sqlite for multiple arguments simultaneously and handling missing values

Is there any way to do something like this in SQL-Query? Maybe a list as an input argument? The dates I want are consecutive, but not all dates exist in the database. If the date does not exist, the result should be "No."

dates = [dt.datetime(2008,1,1), dt.datetime(2008,1,2), dt.datetime(2008,1,3), dt.datetime(2008,1,4), dt.datetime(2008,1,5)] id = "361-442" result = [] for date in dates: curs.execute('''SELECT price, date FROM prices where date = ? AND id = ?''', (date, id)) query = curs.fetchall() if query == []: result.append([None, arg]) else: result.append(query)

+4

python list sqlite

Randomtheories Jan 05 '12 at 23:39

source share

1 answer

unutbu · Accepted Answer · 2012-01-05T23:46:49+0000

To do all the work in sqlite , you can use LEFT JOIN to fill in the missing prices with None :

 sql=''' SELECT p.price, t.date FROM ( {t} ) t LEFT JOIN price p ON p.date = t.date WHERE p.id = ? '''.format(t=' UNION ALL '.join('SELECT {d!r} date'.format(d=d) for d in date)) cursor.execute(sql,[id]) result=cursor.fetchall()

However, this solution requires the formation of a (potentially) huge string in Python in order to create a temporary table of all the desired dates. This is not only slow (including the time taken to create the sqlite temporary table), but also fragile: if len(date) more than about 500, then sqlite increases

 OperationalError: too many terms in compound SELECT

You may be able to get around this if you already have all the dates you need in another table. Then you can replace the ugly SQL code "UNION ALL" above with something like

 SELECT p.price, t.date FROM ( SELECT date from dates ) t LEFT JOIN price p ON p.date = t.date

Although this is an improvement, my timeit tests (see below) show that doing part of the work in Python is even faster:

Doing some of the work in Python :

If you know that dates are consecutive and therefore can be expressed as a range, then:

 curs.execute(''' SELECT date, price FROM prices WHERE date <= ? AND date >= ? AND id = ?''', (max(date), min(date), id))

Otherwise, if the dates are arbitrary, then:

 sql = ''' SELECT date, price FROM prices WHERE date IN ({s}) AND id = ?'''.format(s={','.join(['?']*len(dates))}) curs.execute(sql,dates + [id])

To create a result list with None inserted for lack of prices, you can generate dict pairs from (date,price) and use the dict.get() method to enter the default value of None when the date key is missing:

 result = dict(curs.fetchall()) result = [(result.get(d,None), d) for d in date]

Note, in order to form a dict as a display from dates to prices, I replaced the order of date and price in SQL queries.

Timeit Tests :

I compared these three functions:

 def using_sqlite_union(): sql = ''' SELECT p.price, t.date FROM ( {t} ) t LEFT JOIN price p ON p.date = t.date '''.format(t = ' UNION ALL '.join('SELECT {d!r} date'.format(d = str(d)) for d in dates)) cursor.execute(sql) return cursor.fetchall() def using_sqlite_dates(): sql = ''' SELECT p.price, t.date FROM ( SELECT date from dates ) t LEFT JOIN price p ON p.date = t.date ''' cursor.execute(sql) return cursor.fetchall() def using_python_dict(): cursor.execute(''' SELECT date, price FROM price WHERE date <= ? AND date >= ? ''', (max(dates), min(dates))) result = dict(cursor.fetchall()) result = [(result.get(d,None), d) for d in dates] return result N = 500 m = 10 omit = random.sample(range(N), m) dates = [ datetime.date(2000, 1, 1)+datetime.timedelta(days = i) for i in range(N) ] rows = [ (d, random.random()) for i, d in enumerate(dates) if i not in omit ]

rows defined data that was INSERTed in the price table.

Timeit test results :

Lead time:

 python -mtimeit -s'import timeit_sqlite_union as t' 't.using_python_dict()'

gave the following steps:

 ·────────────────────·────────────────────· │ using_python_dict │ 1.47 msec per loop │ │ using_sqlite_dates │ 3.39 msec per loop │ │ using_sqlite_union │ 5.69 msec per loop │ ·────────────────────·────────────────────·

using_python_dict about 2.3 times faster than using_sqlite_dates . Even if we increase the total number of dates to 10,000, the speed ratio will remain unchanged:

 ·────────────────────·────────────────────· │ using_python_dict │ 32.5 msec per loop │ │ using_sqlite_dates │ 81.5 msec per loop │ ·────────────────────·────────────────────·

Conclusion: transitioning all work to sqlite is not necessarily faster.

Query Sqlite for multiple arguments simultaneously and handling missing values

More articles: