Simple one-vector input arrays are considered incompatible by scikit

Question

Simple one-vector input arrays are considered incompatible by scikit

I have two variables originally from the same pandas df. I am extracting one from TT and the other at t. I use TT to predict t, which is binary. I cannot determine why variables are treated as having incompatible forms using scikit. I applied TT as a fix, but that didn't work.

>>> TT=adf.x1.values
>>> t=adf.y.values
>>> TT.shape
(2856L,)
>>> t.shape
(2856L,)
>>> TT
array([ 4.43081665,  5.99146461,  4.86753464, ...,  4.58496761,
        8.4553175 ,  7.37775898], dtype=float32)
>>> t
array([ 0.,  0.,  0., ...,  0.,  0.,  0.], dtype=float32)
>>> clf=LogisticRegression(C=1)   
>>> clf.fit(TT,t)
Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:...\sklearn\svm\base.py", line 686, in fit
        (X.shape[0], y.shape[0]))
ValueError: X and y have incompatible shapes.
X has 1 samples, but y has 2856.)

+4

python arrays numpy pandas scikit-learn

user3062149 Jan 24 '14 at 21:30

source share

1 answer

wflynny · Accepted Answer · 2014-01-24T21:51:44+0000

If you look at the documentation on sklearn.linear_model.LogisticRegression.fit,

TTmust be shaped (n_samples, n_features)and
tmust have a form (n_samples).

TT 2D-. TT, (2856L, 1), TT.reshape(-1, 1) , , , , .

Simple one-vector input arrays are considered incompatible by scikit

More articles: