How to resolve "IndexError: Too Many Indexes for an Array"

My code below gives me the following error: IndexError: too many indexes for the array. I am completely new to machine learning, so I don’t know how to solve this. Any help would be appreciated.

train = pandas.read_csv("D:/...input/train.csv")


xTrain = train.iloc[:,0:54]
yTrain = train.iloc[:,54:]


from sklearn.cross_validation import cross_val_score
clf = LogisticRegression(multi_class='multinomial')
scores = cross_val_score(clf, xTrain, yTrain, cv=10, scoring='accuracy')
print('****Results****')
print(scores.mean())
+6
source share
4 answers

The error code that you get basically says that you declared the contents for your array, which is not suitable for it. I do not see the declaration of your array, but I assume that it is one size, and the program objects to you, considering it as two-dimensional.

, , , , , , .

, , : IndexError: . Numpy Array 1 2

+2

ML- Pandas Dataframe:

  1. X y .

  2. (X_train, y_train) (X_test, y_test).

  3. AUC ( ). " IndexError: " - y_train, 1-D , 2-D , . "y_train" y_train [ ""] .


   # Importing Packages :

   import pandas as pd

   from sklearn.model_selection import cross_val_score

   from sklearn.model_selection import StratifiedShuffleSplit

   # Seperating Predictor and Target Columns into X and y Respectively :
   # df -> Dataframe extracted from CSV File

   data_X = df.drop(['y'], axis=1) 
   data_y = pd.DataFrame(df['y'])

   # Making a Stratified Shuffle Split of Train and Test Data (test_size=0.3 Denotes 30 % Test Data and Remaining 70% Train Data) :

   rs = StratifiedShuffleSplit(n_splits=2, test_size=0.3,random_state=2)       
   rs.get_n_splits(data_X,data_y)

   for train_index, test_index in rs.split(data_X,data_y):

       # Splitting Training and Testing Data based on Index Values :

       X_train,X_test = data_X.iloc[train_index], data_X.iloc[test_index]
       y_train,y_test = data_y.iloc[train_index], data_y.iloc[test_index]

       # Calculating 5-Fold Cross-Validated AUC (cv=5) - Error occurs due to Dimension of **y_train** in this Line :

       classify_cross_val_score = cross_val_score(classify, X_train, y_train, cv=5, scoring='roc_auc').mean()

       print("Classify_Cross_Val_Score ",classify_cross_val_score) # Error at Previous Line.

       # Worked after Replacing 'y_train' with y_train['y'] in above Line 
       # where y is the ONLY Column (or) Series Present in the Pandas Data frame 
       # (i.e) Target variable for Prediction :

       classify_cross_val_score = cross_val_score(classify, X_train, y_train['y'], cv=5, scoring='roc_auc').mean()

       print("Classify_Cross_Val_Score ",classify_cross_val_score)

       print(y_train.shape)

       print(y_train['y'].shape)

:

    Classify_Cross_Val_Score  0.7021433588790991
    (31647, 1) # 2-D
    (31647,)   # 1-D

: sklearn.model_selection cross_val_score. cross_val_score sklearn.model_selection, sklearn.cross_validation, .

+5

, "y" 2-D, , 1-D, .

:

1. y=numpy.zeros(shape=(len(list),1))
2. y=numpy.zeros(shape=(len(list))) 

y 1, y 2-D. 1-D , , 2.

+2

matplotlib [5540 ,:], 5540 - , [5540 ,:] , .

, - , , , .

, , [5540,].

0

Source: https://habr.com/ru/post/1659399/


All Articles