Numpy Genfromtxt is slower than pandas read_csv

Question

Numpy Genfromtxt is slower than pandas read_csv

I download a CSV file (if you need a specific file, this is a training csv from http://www.kaggle.com/c/loan-default-prediction ). Loading csv in numpy takes significantly longer than in pandas.

timeit("genfromtxt('train_v2.csv', delimiter=',')", "from numpy import genfromtxt",  number=1)
102.46608114242554

timeit("pandas.io.parsers.read_csv('train_v2.csv')", "import pandas",  number=1)
13.833590984344482

I also mentioned that numpy memory usage fluctuates a lot wildly, goes higher and has significantly higher memory load after boot. (2.49 GB for numpy vs ~ 600MB for pandas). All data types in pandas are 8 bytes, so different types of dtypes are no different. I did not have the opportunity to maximize memory use, so the time difference cannot be attributed to the search call.

Any reason for this difference? Is genfromtxt really less effective? (And comforts a bunch of memory?)

EDIT:

numpy version 1.8.0

pandas version 0.13.0-111-ge29c8e8

+5

python numpy pandas csv

Kurt spindler Jan 31 '14 at 18:03

source share

No one has answered this question yet.

See similar questions:

33

Python from the memory of a large CSV file (numpy)