Why sometimes tensor flow works slower and slower with the learning process?

I train the RNN network, the first era used 7.5 hours. But when the training process starts, the tensor flow is slower and slower, the second era is 55 hours. I checked the code, most APIs that become slower over time:

  • session.run([var1, var1, ...], feed_dict=feed),
  • tensor.eval(feed_dict=feed).

For example, one line code session.run[var1, var2, ...], feed_dict=feed), as the program starts, it uses 0.1 seconds, but when the process starts, the time used for this line of code becomes more and more, after 10 hours, the time of this line costs up to 10 seconds.

I have come across this several times. What caused this? How could I do this to avoid this?

If this line of code: self.shapes = [numpy.zeros(g[1].get_shape(), numy.float32) for g in self.compute_gradients]adds nodes to the tensor flow graph? I suspect this may be the reason. This line of code will be called many times periodically, and is selfnot an object tf.train.optimizer.

+4
source share
2 answers

Try completing your chart after creating it (graph.finalize ()). This will prevent the addition of operations to the chart. I also believe that self.compute_gradients adds operations to the schedule. Try defining an operation outside the loop and running it inside the loop

+3
source

I had a similar problem. My solution was that

tf.reset_default_graph()

after every era or sample. This resets the schedule and frees up all the resources used to close the session.

+2

Source: https://habr.com/ru/post/1651978/


All Articles