Same code, very excellent accuracy on windows / ubuntu (Keras / Tensorflow)

import pandas as pd import numpy as np from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import LSTM from keras.optimizers import Adam from sklearn.preprocessing import MinMaxScaler def create_dataset(dataset, datasetClass, look_back): dataX, dataY = [], [] for i in range(len(dataset)-look_back-1): a = dataset[i:(i+look_back), 0] dataX.append(a) dataY.append(datasetClass[:,(i+look_back):(i+look_back+1)]) return np.array(dataX), np.array(dataY) def one_hot_encode(dataset): data = np.zeros((11, len(dataset)),dtype='int') for i in range(len(dataset)): data[dataset[i]-1,i] = 1 return data #Set a seed for repeatable results np.random.seed(12) dataframe = pd.read_csv('time-series.csv', usecols=[1], engine='python') dataset = dataframe.values dataset = dataset.astype('float32') dataframeClass = pd.read_csv('time-series-as-class.csv', usecols=[1], engine='python') datasetClass = dataframeClass.values datasetClass = datasetClass.astype('int') datasetClass = one_hot_encode(datasetClass) #normalize input vals scaler = MinMaxScaler(feature_range=(0, 1)) dataset = scaler.fit_transform(dataset) #separate to test/train train_size = int(len(dataset) * 0.67) test_size = len(dataset) - train_size train, test = dataset[0:train_size, :], dataset[train_size:len(dataset), :] trainClass, testClass = datasetClass[:, 0:train_size,], datasetClass[:, train_size:len(dataset)] #set up sliding windows look_back = 150 trainX, trainY = create_dataset(train, trainClass, look_back) testX, testY = create_dataset(test, testClass, look_back) #reformat for proper passing to nn trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1])) testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1])) trainY = np.squeeze(trainY, 2) testY = np.squeeze(testY, 2) # create and fit the LSTM network model = Sequential() model.add(LSTM(15, input_shape=(1,look_back))) model.add(Dense(22,activation='tanh')) model.add(Dropout(0.2)) model.add(Dense(11,activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['categorical_accuracy']) print(model.summary()) model.fit(trainX, trainY, epochs=90, batch_size=1, verbose=2) # make predictions trainPredict = model.predict(trainX) testPredict = model.predict(testX) 

I ran this on Ubuntu and on Windows. Tested on windows with keras v 2.0.4 and 2.0.8, on ubuntu with 2.0.5 (the latest version is available through the cond)

The accuracy on the windows is 17%, and the categorical cross-entropy is ~ 2, it slowly converges, but it starts there consistently

accuracy on ubuntu is 98%, and categorical cross-entropy is 0, and it actually does not change

The only difference in the code is the path to the csv files, the csv files are exactly the same. What could lead to such a radical difference?

If the difference was a percentage or two, I could write it as a random initialization stop / tf, but how important is it to be pure coincidence

edit: the solution turned out to fix categorical csv files, although they were utf-8, apparently there was something else to get them to play with Linux when they were created on Windows. I'm not sure if I am allowed to mark my own answer as β€œaccepted”

+5
source share
2 answers

The problem was in the csv files that were originally ported from windows. although they were saved in utf-8 format, I still need to go to libreoffice and save them as linux csv files.

In the initial state, they did not stop loading, but did not warm up the encoding correctly, which led to the fact that all single-line encodings are equal to 0. Apparently, this leads to very high accuracy.

+3
source

np.random.seed(12) must be installed before importing keras

+1
source

Source: https://habr.com/ru/post/1272512/


All Articles