Python - input contains NaN, infinity or too much value for dtype ('float64')

Question

Python - input contains NaN, infinity or too much value for dtype ('float64')

I am new to Python. I am trying to use sklearn.cluster. Here is my code:

from sklearn.cluster import MiniBatchKMeans

kmeans=MiniBatchKMeans(n_clusters=2)
kmeans.fit(df)

But I get the following error:

     50             and not np.isfinite(X).all()):
     51         raise ValueError("Input contains NaN, infinity"
---> 52                          " or a value too large for %r." % X.dtype)

 ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

I checked that there is no Nan or infinity value. Thus, only one option remains. However, my data information tells me that all variables are float64, so I don’t understand where this problem comes from.

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 362358 entries, 135 to 4747145
Data columns (total 8 columns):
User         362358 non-null float64
Hour         362352 non-null float64
Minute       362352 non-null float64
Day          362352 non-null float64
Month        362352 non-null float64
Year         362352 non-null float64
Latitude     362352 non-null float64
Longitude    362352 non-null float64
dtypes: float64(8)
memory usage: 24.9 MB

Many thanks,

+4

python pandas scikit-learn machine-learning k-means

Mitch Dec 18 '15 at 15:08

source share

2 answers

David Maust · Answer 1 · 2015-12-20T07:39:50+0000

df.info(), , 6 , . , 6 , .

<class 'pandas.core.frame.DataFrame'>
Int64Index: 362358 entries, 135 to 4747145
Data columns (total 8 columns):
User         362358 non-null float64
Hour         362352 non-null float64
Minute       362352 non-null float64
Day          362352 non-null float64
Month        362352 non-null float64
Year         362352 non-null float64
Latitude     362352 non-null float64
Longitude    362352 non-null float64
dtypes: float64(8)
memory usage: 24.9 MB

Fabio Lamanna · Answer 2 · 2015-12-18T15:32:50+0000

, fit() "array-like, shape = [n_samples, n_features]", pandas. dataframe :

kmeans=MiniBatchKMeans(n_clusters=2)
kmeans.fit(df.values)

. , .

Python - input contains NaN, infinity or too much value for dtype ('float64')

More articles: