To do all the work in sqlite , you can use LEFT JOIN to fill in the missing prices with None :
sql=''' SELECT p.price, t.date FROM ( {t} ) t LEFT JOIN price p ON p.date = t.date WHERE p.id = ? '''.format(t=' UNION ALL '.join('SELECT {d!r} date'.format(d=d) for d in date)) cursor.execute(sql,[id]) result=cursor.fetchall()
However, this solution requires the formation of a (potentially) huge string in Python in order to create a temporary table of all the desired dates. This is not only slow (including the time taken to create the sqlite temporary table), but also fragile: if len(date) more than about 500, then sqlite increases
OperationalError: too many terms in compound SELECT
You may be able to get around this if you already have all the dates you need in another table. Then you can replace the ugly SQL code "UNION ALL" above with something like
SELECT p.price, t.date FROM ( SELECT date from dates ) t LEFT JOIN price p ON p.date = t.date
Although this is an improvement, my timeit tests (see below) show that doing part of the work in Python is even faster:
Doing some of the work in Python :
If you know that dates are consecutive and therefore can be expressed as a range, then:
curs.execute(''' SELECT date, price FROM prices WHERE date <= ? AND date >= ? AND id = ?''', (max(date), min(date), id))
Otherwise, if the dates are arbitrary, then:
sql = ''' SELECT date, price FROM prices WHERE date IN ({s}) AND id = ?'''.format(s={','.join(['?']*len(dates))}) curs.execute(sql,dates + [id])
To create a result list with None inserted for lack of prices, you can generate dict pairs from (date,price) and use the dict.get() method to enter the default value of None when the date key is missing:
result = dict(curs.fetchall()) result = [(result.get(d,None), d) for d in date]
Note, in order to form a dict as a display from dates to prices, I replaced the order of date and price in SQL queries.
Timeit Tests :
I compared these three functions:
def using_sqlite_union(): sql = ''' SELECT p.price, t.date FROM ( {t} ) t LEFT JOIN price p ON p.date = t.date '''.format(t = ' UNION ALL '.join('SELECT {d!r} date'.format(d = str(d)) for d in dates)) cursor.execute(sql) return cursor.fetchall() def using_sqlite_dates(): sql = ''' SELECT p.price, t.date FROM ( SELECT date from dates ) t LEFT JOIN price p ON p.date = t.date ''' cursor.execute(sql) return cursor.fetchall() def using_python_dict(): cursor.execute(''' SELECT date, price FROM price WHERE date <= ? AND date >= ? ''', (max(dates), min(dates))) result = dict(cursor.fetchall()) result = [(result.get(d,None), d) for d in dates] return result N = 500 m = 10 omit = random.sample(range(N), m) dates = [ datetime.date(2000, 1, 1)+datetime.timedelta(days = i) for i in range(N) ] rows = [ (d, random.random()) for i, d in enumerate(dates) if i not in omit ]
rows defined data that was INSERTed in the price table.
Timeit test results :
Lead time:
python -mtimeit -s'import timeit_sqlite_union as t' 't.using_python_dict()'
gave the following steps:
Β·ββββββββββββββββββββΒ·ββββββββββββββββββββΒ· β using_python_dict β 1.47 msec per loop β β using_sqlite_dates β 3.39 msec per loop β β using_sqlite_union β 5.69 msec per loop β Β·ββββββββββββββββββββΒ·ββββββββββββββββββββΒ·
using_python_dict about 2.3 times faster than using_sqlite_dates . Even if we increase the total number of dates to 10,000, the speed ratio will remain unchanged:
Β·ββββββββββββββββββββΒ·ββββββββββββββββββββΒ· β using_python_dict β 32.5 msec per loop β β using_sqlite_dates β 81.5 msec per loop β Β·ββββββββββββββββββββΒ·ββββββββββββββββββββΒ·
Conclusion: transitioning all work to sqlite is not necessarily faster.