I’m having difficulty understanding why the GPU and CPU speeds are similar to small networks (sometimes the processor is faster), and the GPU works faster with larger networks. The code at the bottom of the question runs in 103.7s on the i7-6700k, but when using tensorflow-gpu, the code starts in 29.5 seconds.
However, when I train a network that has 100 hidden neurons, instead of 1000, as in the example below, I get ~ 20 seconds when using the GPU and ~ 15 seconds when using the CPU.
I read in another stack overflow the answer that CPU-> GPU transfer takes a lot of time, I assume that this applies to loading sample data onto the GPU.
Can someone explain why this is happening, and possibly refer to some changes in the code that I can do to maximize speed?
import numpy as np import tensorflow as tf import keras from keras.models import Sequential from keras.utils import np_utils from keras.layers.core import Dense, Activation, Flatten, Dropout from sklearn.preprocessing import normalize
source share