I download a CSV file (if you need a specific file, this is a training csv from http://www.kaggle.com/c/loan-default-prediction ). Loading csv in numpy takes significantly longer than in pandas.
timeit("genfromtxt('train_v2.csv', delimiter=',')", "from numpy import genfromtxt", number=1)
102.46608114242554
timeit("pandas.io.parsers.read_csv('train_v2.csv')", "import pandas", number=1)
13.833590984344482
I also mentioned that numpy memory usage fluctuates a lot wildly, goes higher and has significantly higher memory load after boot. (2.49 GB for numpy vs ~ 600MB for pandas). All data types in pandas are 8 bytes, so different types of dtypes are no different. I did not have the opportunity to maximize memory use, so the time difference cannot be attributed to the search call.
Any reason for this difference? Is genfromtxt really less effective? (And comforts a bunch of memory?)
EDIT:
numpy version 1.8.0
pandas version 0.13.0-111-ge29c8e8
source
share