It seems that the parameterization convention is different in pytorch than in tensor stream, so 0.1 in pytorch is equivalent to 0.9 in tensor stream.
More precisely:
In Tensorflow:
running_mean = decay*running_mean + (1-decay)*new_value
In PyTorch:
running_mean = (1-decay)*running_mean + decay*new_value
This means that the value decay
in PyTorch is equivalent to the value (1-decay)
in Tensorflow.
source
share