Keras (Tensorflow backend) is slower on the GPU than on the CPU when learning specific networks

I’m having difficulty understanding why the GPU and CPU speeds are similar to small networks (sometimes the processor is faster), and the GPU works faster with larger networks. The code at the bottom of the question runs in 103.7s on the i7-6700k, but when using tensorflow-gpu, the code starts in 29.5 seconds.

However, when I train a network that has 100 hidden neurons, instead of 1000, as in the example below, I get ~ 20 seconds when using the GPU and ~ 15 seconds when using the CPU.

I read in another stack overflow the answer that CPU-> GPU transfer takes a lot of time, I assume that this applies to loading sample data onto the GPU.

Can someone explain why this is happening, and possibly refer to some changes in the code that I can do to maximize speed?

import numpy as np import tensorflow as tf import keras from keras.models import Sequential from keras.utils import np_utils from keras.layers.core import Dense, Activation, Flatten, Dropout from sklearn.preprocessing import normalize ## Importing the MNIST dataset using Keras from keras.datasets import mnist (X_train, y_train), (X_test, y_test) = mnist.load_data() # reshape for vector input N, x, y = X_train.shape X_train = normalize(np.reshape(X_train, (N, x * y))) N, x, y = X_test.shape X_test = normalize(np.reshape(X_test, (N, x * y))) # one-hot encoding y_train = np_utils.to_categorical(y_train) y_test = np_utils.to_categorical(y_test) model = Sequential() model.add(Dense(output_dim=750, input_dim=784)) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(150)) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(50)) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(50)) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(10)) model.add(Activation('softmax')) model.compile(loss='categorical_crossentropy', optimizer='Nadam', metrics=['accuracy']) fit = model.fit(X_train, y_train, batch_size=128, nb_epoch=10, verbose=0) ## Printing the accuracy of our model, according to the loss function specified in model.compile above score = model.evaluate(X_test, y_test, verbose=0) print('Test score:', score[0]) print('Test accuracy:', score[1]) 
+6
source share
1 answer

In the case of tiny network charges, the download may be here.

Keras loads each chip from RAM into the GPU at the beginning of each iteration, thereby creating a bottleneck in tiny networks (where forward / backward calculations are very fast). You can try using model.fit_generator instead of the usual fit , so that the CPU thread loading the mini-bays will work in parallel.

Unfortunately, I don’t know how to preload the entire dataset on the GPU for Keras (see my problem )

If you use the Tensorflow backend, you can use the Google timeline profiling tool to find out what causes the slowdown. See this issue for reference.

+1
source

Source: https://habr.com/ru/post/1014773/


All Articles