I created the same network with my own and basic tensor flow, but after many hours of testing using many different parameters, I still could not understand why the cores exceed their own tensor flow and give better (slightly, but better) results.
Does Keras use a different weight initialization method? or does it take a different weight decay approach than tf.train.inverse_time_decay?
Ps the difference in account is always similar to
Keras with Tensorflow: ~0.9850 - 0.9885 - ~45 sec. avg. training time for 1 epoch
Tensorflow Native ~0.9780 - 0.9830 - ~23 sec.
My environment:
Python 3.5.2 -Anaconda / Windows 10
CUDA: 8.0 with cuDNN 5.1
Keras 1.2.1
Tensor Stream 0.12.1
Nvidia Geforce GTX 860M
and keras.json :
{
"image_dim_ordering": "tf",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
and you can also copy and execute the following two files
https://github.com/emrahyigit/deep/blob/master/keras_cnn_mnist.py
https://github.com/emrahyigit/deep/blob/master/tf_cnn_mnist.py
https://github.com/emrahyigit/deep/blob/master/mnist.py