ValueError: Unknown label type: when implementing MLPClassifier

I have a dataframe with columns Year, month, day, hour, minute, second, Daily_KWH. I need to predict Daily KWH using neural netowrk. Please let me know how to do this.

Daily_KWH_System year month day hour minute second 0 4136.900384 2016 9 7 0 0 0 1 3061.657187 2016 9 8 0 0 0 2 4099.614033 2016 9 9 0 0 0 3 3922.490275 2016 9 10 0 0 0 4 3957.128982 2016 9 11 0 0 0 

I get a Value error when I fit the model.

:

 X = df[['year','month','day','hour','minute','second']] y = df['Daily_KWH_System'] from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y) from sklearn.preprocessing import StandardScaler scaler = StandardScaler() # Fit only to the training data scaler.fit(X_train) #y_train.shape #X_train.shape X_train = scaler.transform(X_train) X_test = scaler.transform(X_test) from sklearn.neural_network import MLPClassifier mlp = MLPClassifier(hidden_layer_sizes=(30,30,30)) #y_train = np.asarray(df['Daily_KWH_System'], dtype="|S6") mlp.fit(X_train,y_train) 

Error:

 ValueError: Unknown label type: (array([ 2.27016856e+02, 3.02173014e+03, 4.29404190e+03, 2.41273427e+02, 1.76714247e+02, 4.23374425e+03, 
+6
source share
3 answers

First of all, this is a regression problem, not a classification problem, because the values ​​in the Daily_KWH_System column Daily_KWH_System not form a set of labels. Instead, they seem to be real numbers (at least based on the above example).

If you want to approach it as a classification problem independently, then according to the sklearn documentation :

When classified in scikit-learn, y is a vector of integers or a string.

In your case, y is a floating-point vector, and therefore you get an error. So instead of a string

 y = df['Daily_KWH_System'] 

write a line

 y = np.asarray(df['Daily_KWH_System'], dtype="|S6") 

and this will solve the problem. (Here you can learn more about this approach: Python RandomForest - Unknown label Error )

However, since the regression is more appropriate in this case, replace the strings instead

 from sklearn.neural_network import MLPClassifier mlp = MLPClassifier(hidden_layer_sizes=(30,30,30)) 

from

 from sklearn.neural_network import MLPRegressor mlp = MLPRegressor(hidden_layer_sizes=(30,30,30)) 

The code will work without throwing an error (but, of course, there is not enough data to check whether our model works well).

With that said, I don’t think that this is the right approach to select features for this problem.

In this problem, we are dealing with a sequence of real numbers that form a time series. One of the reasonable functions that we could choose is the number of seconds (or minutes \ hours \ days, etc.) that have passed since the start. Since these specific data contain only days, months, and years (other values ​​are always 0), we could choose as a function the number of days that have passed since the very beginning. Then your data frame will look like this:

  Daily_KWH_System days_passed 0 4136.900384 0 1 3061.657187 1 2 4099.614033 2 3 3922.490275 3 4 3957.128982 4 

You can take the values ​​in the days_passed column as functions and the values ​​in the Daily_KWH_System as goals. You can also add some indicator functions. For example, if you think that the end of the year can affect the goal, you can add an indicator function that indicates whether the month is December or not.

If the data is really daily (at least in this example you have one data point per day) and you want to solve this problem with neural networks, then another sensible approach would be to treat it as a time series and try a suitable one recurrent neural network. Here are some great blog posts that describe this approach:

http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

http://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/

+3
source

The fit() function expects y to be a 1D list. By slicing the Pandas framework, you always get a 2D object. This means that for your case, you need to convert the 2D object you received from slicing the DataFrame into the actual 1D list, as expected, using the appropriate function:

 y = list(df['Daily_KWH_System']) 
0
source

Instead of mlp.fit(X_train,y_train) use this mlp.fit(X_train,y_train.values)

-one
source

Source: https://habr.com/ru/post/1015644/


All Articles