I just checked the time to load the data array from csv, creating a database in Postgres and writing a table to it in python and R.
I was surprised that the times were very similar:
Python code first: (e.g.
)
import timeit tic = timeit.default_timer() tic4 = timeit.default_timer() import xlrd as xl import psycopg2 as pq import os import pandas as pd import numpy as np import csv from pprint import pprint as pp perf_dir = '/myhomedir' toc4=timeit.default_timer()
and R code (which is much more readable)
tic = proc.time() library(RPostgreSQL) tic1 = proc.time() system('dropdb ptest1') system('createdb ptest1') drv = dbDriver("PostgreSQL") con = dbConnect(drv, dbname='ptest1') toc1 = proc.time() time1 = toc1 - tic1 tic2 = proc.time() id.1 = read.csv('/myhomedir/di1.csv',stringsAsFactors=F,sep='\t') id.2 = read.csv('/myhomedir/di2.csv',stringsAsFactors=F) id.3 = read.csv('/myhomedir/di.c.csv',stringsAsFactors=F,sep='\t') id.3 = id.3[,-1] toc2 = proc.time() time2 = toc2 - tic2 tic3 = proc.time() dbWriteTable(con,'id1',id.1) dbWriteTable(con,'id2',id.2) dbWriteTable(con,'id3',id.3) toc3 = proc.time() time3 = toc3 - tic3 toc = proc.time() time = toc - tic tyme = rbind(time1,time2,time3,time) tyme = data.frame(Function=c('Create & Connect to DB',"Load CSV for save","Write Table to DB",'Overall Time'),tyme)
I was very surprised at how close the times are for the two. (I read terribly when R was slow and Python was very fast)
For python
>>> Overall time: 2.48381304741 dB create & connect time: 1.96832108498 Load id csvs time: 0.000378847122192 Create tables and write to db time: 0.35303401947 Time to import libraries: 0.162075042725
and for R
Function user.self sys.self elapsed user.child sys.child time1 Create & Connect to DB 0.112 0.016 1.943 0.06 0.004 time2 Load CSV for save 0.008 0.000 0.006 0.00 0.000 time3 Write Table to DB 0.096 0.004 0.349 0.00 0.000 time Overall Time 0.376 0.028 2.463 0.06 0.004
I was wondering if this is due to the fact that I INSERT building a string in time in the python table version.
Therefore, the main question is whether there is an equivalent in python for the dbWriteTable block in R code and will it speed up the work?
The second supporting question is that there is something clearly wrong in the code, which can slow things down.
Happy to provide a csv sample if this helps.
Not wanting to start a fiery war on R v Python, I just wanted to know how I can make my code faster.
thanks