Non-deterministic gradient calculations

I realized that my patterns get different every time I train them, although I keep the random TensorFlow seed the same.

I checked that:

  • Initialization is determined; the balance is identical before the first update.
  • Inputs are deterministic. In fact, various direct calculations, including loss, are identical for the very first batch.
  • Gradients for the first batch are different. Specifically, I am comparing the outputs of tf.gradients(loss, train_variables) . While loss and train_variables have the same value, gradients sometimes differ for some of the variables. The differences are quite significant (sometimes the difference in the sum of the absolute differences for one gradient of the variable is greater than 1).

I conclude that this is a gradient calculation that causes non-determinism. I examined this issue and the problem persists when running on the processor with intra_op_parallelism_thread=1 and inter_op_parallelism_thread=1 .

How can a backward dip be non-deterministic if there is no front passage? How can I debug this further?

+6
source share
1 answer

This answer may seem a little obvious, but are you using some kind of non-deterministic regularization like dropouts? Given that the dropout โ€œskipsโ€ some compounds randomly during training, this may cause a difference in the gradients.

Edit: Related questions:

Edit 2: This seems like a problem with the implementation of TensorFlow. See the following open issues on GitHub:

-1
source

Source: https://habr.com/ru/post/1264615/


All Articles