I hear different opinions about when to use Pandas vs, when to use SQL.
I tried to do the following in Pandas on 19 150 869 data lines:
for idx, row in df.iterrows(): tmp = int((int(row['M']) / PeriodGranularity))+1 row['TimeSlot'] = str(row["D"]+1) + "-" + str(row["H"]) + "-" + str(tmp)
And it turned out that it was so long that I had to cancel after 20 minutes.
In SQLLite, I did the following:
Select strftime('%w',PlayedTimestamp)+1 as D,strftime('%H',PlayedTimestamp) as H,strftime('%M',PlayedTimestamp) as M,cast(strftime('%M',PlayedTimestamp) / 15+1 as int) as TimeSlot from tblMain
and found that it took 4 seconds ("19150869 lines returned in 2445 ms").
Note: For Pandas code, I ran this in the step in front of it to get data from db:
sqlStr = "Select strftime('%w',PlayedTimestamp)+1 as D,strftime('%H',PlayedTimestamp) as H,strftime('%M',PlayedTimestamp) as M from tblMain" df = pd.read_sql_query(sqlStr, con)
Is this my coding, which is to blame here or is it generally accepted that SQL is much faster for certain tasks?