Pandas loads CSV faster than SQL

It seems that loading data from CSV is faster than from SQL (Postgre SQL) with Pandas. (I have an SSD)

Here is my test code:

import pandas as pd
import numpy as np

start = time.time()
df = pd.read_csv('foo.csv')
df *= 3
duration = time.time() - start
print('{0}s'.format(duration))

engine = create_engine('postgresql://user:password@host:port/schema')
start = time.time()
df = pd.read_sql_query("select * from mytable", engine)
df *= 3
duration = time.time() - start
print('{0}s'.format(duration))

Foo.csv and the database are the same (same amount of data and columns in both, 4 columns, 100,000 rows filled with random int).

CSV takes 0.05 s

SQL takes 0.5 s

Do you think CSV is 10 times faster than SQL? I wonder if I missed something here ...

+4
source share
2 answers

This is normal behavior, reading a csv file is always one of the fastest ways to simply load data.

CSV . . CSV . SQL , . , , , .

, csv 1920 2017 csv, 2010 .

csv approach csv, 2010 2017 .

SQL- - SQL

SQL .

+3

, CSV , SQL, , :

  • CSV , , , .

  • SQL , .. , , . , - , CSV.

, , , , .

,

select * from mytable where myindex = "myvalue";

csv. - SQL

0

Source: https://habr.com/ru/post/1676768/


All Articles