Forced conversion of numpy non-numeric arrays with NAN replacement

Consider an array

x = np.array(['1', '2', 'a'])

Binding to convert to a floating-point array throws an exception

 x.astype(np.float) ValueError: could not convert string to float: a 

Does numpy have any efficient way to force this to a numeric array, replacing non-numeric values ​​with something like NAN?

Alternatively, is there an efficient numpy function equivalent to np.isnan , but which also checks for non-numeric elements like letters?

+6
source share
2 answers

You can convert an array of strings to an array float (with NaNs) using np.genfromtxt :

 In [83]: np.set_printoptions(precision=3, suppress=True) In [84]: np.genfromtxt(np.array(['1','2','3.14','1e-3','b','nan','inf','-inf'])) Out[84]: array([ 1. , 2. , 3.14 , 0.001, nan, nan, inf, -inf]) 

In Python3, you need to convert the array to bytes first, for example. as through np.astype() :

 In [18]: np.genfromtxt(np.array(['1','2','3.14','1e-3','b','nan','inf','-inf']).astype('bytes')) Out[18]: array([ 1. , 2. , 3.14 , 0.001, nan, nan, inf, -inf]) 

Here is a way to identify numeric strings:

 In [34]: x Out[34]: array(['1', '2', 'a'], dtype='|S1') In [35]: x.astype('unicode') Out[35]: array([u'1', u'2', u'a'], dtype='<U1') In [36]: np.char.isnumeric(x.astype('unicode')) Out[36]: array([ True, True, False], dtype=bool) 

Note that β€œnumeric” means Unicode that contains only numeric characters, that is, characters that have the Unicode numeric property. It does not include the decimal point. Therefore, u'1.3' not considered "numeric".

+10
source

If you use pandas, you can use the pd.to_numeric() method:

 In [1]: import numpy as np In [2]: import pandas as pd In [3]: x = np.array(['1', '2', 'a']) In [4]: pd.to_numeric(x, errors='coerce') Out[4]: array([ 1., 2., nan]) 
+4
source

Source: https://habr.com/ru/post/943654/


All Articles