Why is my CNN accuracy not reproducing?

I want to reproduce the results for CNN that I train. So I set the seed in my script :

import tensorflow as tf tf.set_random_seed(0) # make sure results are reproducible import numpy as np np.random.seed(0) # make sure results are reproducible 

The documents set_random_seed and np.random.seed did not report special behavior for seed 0 .

When I run the same script twice on the same computer in a few minutes and without updates, I expected to get the same results. However, it is not:

Launch 1:

 0;0.001733;0.001313 500;0.390164;0.388188 

Run 2:

 0;0.006986;0.007000 500;0.375288;0.374250 

How can I get the network to reproduce the results?

System

 $ python -c "import tensorflow;print(tensorflow.__version__)" I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally 1.0.0 $ python -c "import numpy;print(numpy.__version__)" 1.12.0 
+6
source share
2 answers

While I did not solve the problem, there are possible reasons why the results are not always the same (roughly ordered, most likely / easiest to fix, to most unlikely / harder to fix). I am also trying to give a solution after a problem.

  • Human error - you miss the number / made a typo when you copied the result from one shell onto paper: Registration. Create 2017-12-31-23-54-experiment-result.log for each experiment you run. Not manually, but an experiment creates it. Yes, a timestamp in the title to make it easier to find again. All subsequent ones must be recorded in this file for each individual experiment.
  • Code changed: version control (e.g. git)
  • configuration file changed : version control
  • Pseudo-random number changed: set the seed for random / tensorflow / numpy (yes, you may need to set multiple seeds)
  • Loading data in different ways / in a different order: version control + seed (is the preprocessing really the same?)
  • Environment variables changed : Docker
  • Software (version) changed by: Docker
  • Changed driver (version) : logging
  • Hardware changed: Registration
  • Hardware / software has some reproducibility issues . For example, the fact that floating point multiplication is not associative , and different cores on the GPU can finish the calculations at different times (I'm not sure about this)
  • Hardware has errors

In any case, launching the β€œsame” thing several times can help get a gut feeling, like different things.

Writing paper

If you write a document, I think the following best practice for reproducibility:

  • Add a link to the repository (e.g. git) where all the code is
  • The code must be containerized (e.g. Docker)
  • If there is Python code and requirements.txt , you should specify the exact version of the software , not something like tensorflow>=1.0.0 , but tensorflow==1.2.3
  • Add the git hash of the version used for experimentation. It can be different hashes if you change something between them.
  • Always record driver information (such as for nVidia ) and hardware . Add this to the appendix of your article. Therefore, in the case of subsequent changes, you can at least check whether a change has occurred that could lead to the numbers being different.

To register versions you can use something like this:

 #!/usr/bin/env python # core modules import subprocess def get_logstring(): """ Get important environment information that might influence experiments. Returns ------- logstring : str """ logstring = [] with open('/proc/cpuinfo') as f: cpuinfo = f.readlines() for line in cpuinfo: if "model name" in line: logstring.append("CPU: {}".format(line.strip())) break with open('/proc/driver/nvidia/version') as f: version = f.read().strip() logstring.append("GPU driver: {}".format(version)) logstring.append("VGA: {}".format(find_vga())) return "\n".join(logstring) def find_vga(): vga = subprocess.check_output("lspci | grep -i 'vga\|3d\|2d'", shell=True, executable='/bin/bash') return vga print(get_logstring()) 

which gives something like

 CPU: model name : Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz GPU driver: NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.90 Tue Sep 19 19:17:35 PDT 2017 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5) VGA: 00:02.0 VGA compatible controller: Intel Corporation Skylake Integrated Graphics (rev 06) 02:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2) 
+4
source

Perhaps this is a problem of scale. Remember to set the seed within the scope in which you use the chart, for example. after

 with tf.Graph().as_default() tf.set_random_seed(0) 

This also needs to be done after calling tf.reset_default_graph() .

For a complete example, see How to get stable results with TensorFlow by setting an arbitrary seed.

+1
source

Source: https://habr.com/ru/post/1015096/


All Articles