I am trying to find a good and flexible way to parse CSV files in Python, but none of the standard options seem to fit the bill. I am tempted to write my own, but I think that some combination of what exists in numpy / scipy and the csv module can do what I need, and therefore I do not want to reinvent the wheel.
I would like standard functions to be able to specify separators, indicate whether there is a title, how many lines to skip, comment separators, which columns to ignore, etc. The central function that I am missing is to parse CSV files in such a way as to gracefully process both string data and numeric data. Many of my CSV files have columns containing strings (not having the same length) and numerical data. I would like to be able to use a numpy array for this numeric data, but also be able to access strings. For example, suppose my file looks like this (suppose the columns are separated by tabs):
name favorite_integer favorite_float1 favorite_float2 short_description
johnny 5 60.2 0.52 johnny likes fruitflies
bob 1 17.52 0.001 bob, bobby, robert
data = loadcsv('myfile.csv', delimiter='\t', parse_header=True, comment='#')
I would like to access data in two ways:
: numpy.array, , . - :
floats_and_ints = data.matrix
floats_and_ints[:, 0] # access the integers
floats_and_ints[:, 1:3] # access some of the floats
transpose(floats_and_ints) # etc..
- , : . , :
data['favorite_float1'] # get all the values of the column with header
"favorite_float1"
data['name'] # get all the names of the rows
, favorite_float1 , .
, . :
for row in data:
print "Name: ", row["name"], row["favorite_int"]
(1) numpy.array, , , , , .
(2) , , . csv, , . - numpy.array.
csv/numpy/scipy, ? .
, :
- , , ..
- numpy.array/matrix , .
- ( )