Delete dtype at the end of numpy array

I am writing a method to create an array from a data file. The method looks like this:

import numpy def readDataFile(fileName): try: with open(fileName, 'r') as inputs: data = None for line in inputs: line = line.strip() items = line.split('\t') if data == None: data = numpy.array(items[0:len(items)]) else: data = numpy.vstack((data, items[0:len(items)])) return numpy.array(data) except IOError as ioerr: print 'IOError: ', ioerr return None 

My data file contains strings of numbers, each of which is separated by a tab, for example:

 1 2 3 4 5 6 7 8 9 

And I expect to get an array as follows:

 array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) 

However, the result contains dtype at the end of this word:

 array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype='|S9') 

Because of this, I cannot perform some operations on the result, for example. if I try to find the maximum value for each row using result.max(0) , I get an error:

TypeError: cannot perform reduction using a flexible type.

So, can someone tell me what is wrong with my code and how to fix it? Many thanks.

+6
source share
4 answers

The easiest fix is ​​to use numpy loadtxt:

 data = numpy.loadtxt(fileName, dtype='float') 

Just FYI using numpy.vstack inside a loop is a bad idea. If you decide not to use loadtxt , you can replace your loop with the following to fix the dtype problem and fix numpy.vstack .

 data = [row.split('\t') for row in inputs] data = np.array(data, dtype='float') 

Update

Each time vstack is called, it creates a new array and copies the contents of the old arrays to a new one. This copy is roughly equal to O (n), where n is the size of the array, and if your loop runs n times, it all becomes O (n ** 2), in other words, slow. If you know the final size of the array ahead of time, it's best to create an array outside the loop and populate the existing array. If you do not know the final size of the array, you can use the list inside the loop and call vstack at the end. For instance:

 import numpy as np myArray = np.zeros((10,3)) for i in xrange(len(myArray)): myArray[i] = [i, i+1, i+2] # or: myArray = [] for i in xrange(10): myArray.append(np.array([i, i+1, i+2])) myArray = np.vstack(myArray) 
+8
source

This is how you change data types in numpy:

 >>> x array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> x.astype('|S9') array([['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']], dtype='|S9') >>> x.astype('Float64') array([[ 1., 2., 3.], [ 4., 5., 6.], [ 7., 8., 9.]]) >>> x.astype('int') array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) 
+6
source

... did you first try to turn them into numbers?

 items = [int(x) for x in line.split('\t')] 
+3
source

The Numpy array includes a method to complete this task:

 import numpy as np a = np.array(['A', 'B']) a # Returns: array(['A', 'B'], dtype='|S1') a.tolist() # Returns ['A', 'B'] 

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tolist.html#numpy.ndarray.tolist

0
source

Source: https://habr.com/ru/post/913987/


All Articles