Numpy.polyfit does not process NaN values

I have a problem with this Python code snippet:

import matplotlib matplotlib.use("Agg") import numpy as np import pylab as pl A1=np.loadtxt('/tmp/A1.txt',delimiter=',') A1_extrema = [min(A1),max(A1)] A2=np.loadtxt('/tmp/A2.txt',delimiter=',') pl.close() ab = np.polyfit(A1,A2,1) print ab fit = np.poly1d(ab) print fit r2 = np.corrcoef(A1,A2)[0,1] print r2 pl.plot(A1,A2,'r.', label='TMP36 vs. DS18B20', alpha=0.7) pl.plot(A1_extrema,fit(A1_extrema),'c-') pl.annotate('{0}'.format(r2) , xy=(min(A1)+0.5,fit(min(A1))), size=6, color='r' ) pl.title('Sensor correlations') pl.xlabel("T(x) [degC]") pl.ylabel("T(y) [degC]") pl.grid(True) pl.legend(loc='upper left', prop={'size':8}) pl.savefig('/tmp/C123.png') 

A1 and A2 are arrays containing temperature readings from different sensors. I want to find a correlation between them and show it graphically. However, sometimes sensor read errors occur. And in this case, NaN is inserted into one of the files instead of the temperature value. Then np.polyfit refuses to match the data and returns [nan, nan] as a result. After that, everything else fails.

My question is: how can I convince numpy.polyfit ignore NaN values? NB: The data sets are relatively small at the moment. I expect that they can grow to about 200 thousand .... 600 thousand Elements after deployment.

+6
source share
1 answer

I know this is a little old, but if you have arrays that have NaNs in you, you need to "clear them" only by considering the final indexes. Way to do it

 idx = np.isfinite(x) & np.isfinite(y) ab = np.polyfit(x[idx], y[idx], 1) 

This way you only pass “good” points for polyphyte.

+10
source

Source: https://habr.com/ru/post/982821/


All Articles