Delete dtype at the end of numpy array

Question

Delete dtype at the end of numpy array

I am writing a method to create an array from a data file. The method looks like this:

import numpy def readDataFile(fileName): try: with open(fileName, 'r') as inputs: data = None for line in inputs: line = line.strip() items = line.split('\t') if data == None: data = numpy.array(items[0:len(items)]) else: data = numpy.vstack((data, items[0:len(items)])) return numpy.array(data) except IOError as ioerr: print 'IOError: ', ioerr return None

My data file contains strings of numbers, each of which is separated by a tab, for example:

 1 2 3 4 5 6 7 8 9

And I expect to get an array as follows:

 array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

However, the result contains dtype at the end of this word:

 array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype='|S9')

Because of this, I cannot perform some operations on the result, for example. if I try to find the maximum value for each row using result.max(0) , I get an error:

TypeError: cannot perform reduction using a flexible type.

So, can someone tell me what is wrong with my code and how to fix it? Many thanks.

+6

python arrays numpy

Long thai Apr 23 '12 at 21:43

source share

4 answers

This is how you change data types in numpy:

 >>> x array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> x.astype('|S9') array([['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']], dtype='|S9') >>> x.astype('Float64') array([[ 1., 2., 3.], [ 4., 5., 6.], [ 7., 8., 9.]]) >>> x.astype('int') array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

+6

Akavall Apr 23 '12 at 21:55

source share

... did you first try to turn them into numbers?

 items = [int(x) for x in line.split('\t')]

+3

Ignacio Vazquez-Abrams Apr 23 '12 at 21:48

source share

The Numpy array includes a method to complete this task:

 import numpy as np a = np.array(['A', 'B']) a # Returns: array(['A', 'B'], dtype='|S1') a.tolist() # Returns ['A', 'B']

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tolist.html#numpy.ndarray.tolist

0

Enrique Pérez Herrero Aug 26 '16 at 16:39

source share

Bi rico · Accepted Answer · 2012-04-24T02:33:01+0000

The easiest fix is to use numpy loadtxt:

 data = numpy.loadtxt(fileName, dtype='float')

Just FYI using numpy.vstack inside a loop is a bad idea. If you decide not to use loadtxt , you can replace your loop with the following to fix the dtype problem and fix numpy.vstack .

 data = [row.split('\t') for row in inputs] data = np.array(data, dtype='float')

Update

Each time vstack is called, it creates a new array and copies the contents of the old arrays to a new one. This copy is roughly equal to O (n), where n is the size of the array, and if your loop runs n times, it all becomes O (n ** 2), in other words, slow. If you know the final size of the array ahead of time, it's best to create an array outside the loop and populate the existing array. If you do not know the final size of the array, you can use the list inside the loop and call vstack at the end. For instance:

 import numpy as np myArray = np.zeros((10,3)) for i in xrange(len(myArray)): myArray[i] = [i, i+1, i+2] # or: myArray = [] for i in xrange(10): myArray.append(np.array([i, i+1, i+2])) myArray = np.vstack(myArray)

Delete dtype at the end of numpy array

More articles: