ValueError: Unknown label type: 'unknown'

I am trying to run the following code. Btw, I am new to both python and sklearn.

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression


# data import and preparation
trainData = pd.read_csv('train.csv')
train = trainData.values
testData = pd.read_csv('test.csv')
test = testData.values
X = np.c_[train[:, 0], train[:, 2], train[:, 6:7],  train[:, 9]]
X = np.nan_to_num(X)
y = train[:, 1]
Xtest = np.c_[test[:, 0:1], test[:, 5:6],  test[:, 8]]
Xtest = np.nan_to_num(Xtest)


# model
lr = LogisticRegression()
lr.fit(X, y)

where y is np.ndarray of 0 and 1

I get the following:

File "C: \ Anaconda3 \ lib \ site-packages \ sklearn \ linear_model \ logistic.py", line> 1174, according to check_classification_targets (y)

File "C: \ Anaconda3 \ lib \ site-packages \ sklearn \ utils \ multiclass.py", line 172,> in check_classification_targets raise ValueError ("Unknown label type:% r"% y_type)

ValueError: Unknown label type: 'unknown'

from the sklearn documentation: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit

y: array-like, shape (n_samples,) ( , )

?

UPD:

y - ([0.0, 1.0, 1.0,..., 0.0, 1.0, 0.0], dtype = object) size is (891,)

+4
1

y object, sklearn . y=y.astype('int') y = train[:, 1].

+13

Source: https://habr.com/ru/post/1682389/


All Articles