Forced conversion of numpy non-numeric arrays with NAN replacement

Question

Forced conversion of numpy non-numeric arrays with NAN replacement

Consider an array

x = np.array(['1', '2', 'a'])

Binding to convert to a floating-point array throws an exception

 x.astype(np.float) ValueError: could not convert string to float: a

Does numpy have any efficient way to force this to a numeric array, replacing non-numeric values with something like NAN?

Alternatively, is there an efficient numpy function equivalent to np.isnan , but which also checks for non-numeric elements like letters?

+6

python numpy type-conversion coercion nan

Chrisb Apr 25 '13 at 19:47

source share

2 answers

If you use pandas, you can use the pd.to_numeric() method:

 In [1]: import numpy as np In [2]: import pandas as pd In [3]: x = np.array(['1', '2', 'a']) In [4]: pd.to_numeric(x, errors='coerce') Out[4]: array([ 1., 2., nan])

+4

Bill Sep 16 '16 at 18:23

source share

unutbu · Accepted Answer · 2013-04-25T19:54:37+0000

You can convert an array of strings to an array float (with NaNs) using np.genfromtxt :

 In [83]: np.set_printoptions(precision=3, suppress=True) In [84]: np.genfromtxt(np.array(['1','2','3.14','1e-3','b','nan','inf','-inf'])) Out[84]: array([ 1. , 2. , 3.14 , 0.001, nan, nan, inf, -inf])

In Python3, you need to convert the array to bytes first, for example. as through np.astype() :

 In [18]: np.genfromtxt(np.array(['1','2','3.14','1e-3','b','nan','inf','-inf']).astype('bytes')) Out[18]: array([ 1. , 2. , 3.14 , 0.001, nan, nan, inf, -inf])

Here is a way to identify numeric strings:

 In [34]: x Out[34]: array(['1', '2', 'a'], dtype='|S1') In [35]: x.astype('unicode') Out[35]: array([u'1', u'2', u'a'], dtype='<U1') In [36]: np.char.isnumeric(x.astype('unicode')) Out[36]: array([ True, True, False], dtype=bool)

Note that “numeric” means Unicode that contains only numeric characters, that is, characters that have the Unicode numeric property. It does not include the decimal point. Therefore, u'1.3' not considered "numeric".

Forced conversion of numpy non-numeric arrays with NAN replacement

More articles: