Load csv into numpy 2D matrix for plotting

Question

Load csv into numpy 2D matrix for plotting

Given this CSV file:

"A","B","C","D","E","F","timestamp" 611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12 611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12 611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12

I just want to load it as a / ndarray matrix with 3 rows and 7 columns. However, for some reason, all I can get from numpy is an ndarray with 3 rows (one per row) and no columns.

 r = np.genfromtxt(fname,delimiter=',',dtype=None, names=True) print r print r.shape [ (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291111964948.0) (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291113113366.0) (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291120650486.0)] (3,)

I can manually iterate and hack it into the form I want, but that seems silly. I just want to load it as a suitable matrix so that I can cut it into different dimensions and draw it the same way as in Matlab.

+46

python arrays numpy csv reshape

dgorissen Nov 30 '10 at 15:40

source share

3 answers

I think using dtype , where there is a line with names, confuses this procedure. Try

 >>> r = np.genfromtxt(fname, delimiter=',', names=True) >>> r array([[ 6.11882430e+02, 9.08956010e+03, 5.13300000e+03, 8.64075140e+02, 1.71537476e+03, 7.65227770e+02, 1.29111196e+12], [ 6.11882430e+02, 9.08956010e+03, 5.13300000e+03, 8.64075140e+02, 1.71537476e+03, 7.65227770e+02, 1.29111311e+12], [ 6.11882430e+02, 9.08956010e+03, 5.13300000e+03, 8.64075140e+02, 1.71537476e+03, 7.65227770e+02, 1.29112065e+12]]) >>> r[:,0] # Slice 0'th column array([ 611.88243, 611.88243, 611.88243])

+3

mtrw Nov 30 '10 at 16:07

source share

You can read the CSV file with the headers into an array of NumPy entries with np.recfromcsv . For example:

 import numpy as np import StringIO csv_text = """\ "A","B","C","D","E","F","timestamp" 611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12 611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12 611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12 """ # Make a file-like object csv_file = StringIO.StringIO(csv_text) csv_file.seek(0) # Read the CSV file into a Numpy record array r = np.recfromcsv(csv_file, case_sensitive=True) print(repr(r))

which is as follows:

 rec.array([ ( 611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777, 1.29111196e+12), ( 611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777, 1.29111311e+12), ( 611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777, 1.29112065e+12)], dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8'), ('D', '<f8'), ('E', '<f8'), ('F', '<f8'), ('timestamp', '<f8')])

You can access a named column like this r['E'] :

 array([ 1715.37476, 1715.37476, 1715.37476])

+3

Mike T Feb 22 '16 at 3:04 on

source share

Kaveh_kh · Accepted Answer · 2010-11-30 16:20

Pure numpy

 numpy.loadtxt(open("test.csv", "rb"), delimiter=",", skiprows=1)

Check out the loadtxt documentation.

You can also use the python csv module:

 import csv import numpy reader = csv.reader(open("test.csv", "rb"), delimiter=",") x = list(reader) result = numpy.array(x).astype("float")

You will have to convert it to your favorite digital type. I think you can write everything in one line:

 result = numpy.array (list (csv.reader (open ("test.csv", "rb"), delimiter = ","))). astype ("float")

Added tooltip:

You can also use pandas.io.parsers.read_csv and get the associated numpy array, which can be faster.

Load csv into numpy 2D matrix for plotting

More articles: