Node / term bias exists only to ensure that the predicted result is unbiased. If your input has a dynamic range (range) that goes from -1 to +1, and your output is just translating the input to +3, a neural network with a bias condition will just have a neural bias system with a non-zero weight and the rest will be zero . If in this situation you do not have a neural bias, all the activation and weighing functions will be optimized in such a way as to simulate at best a simple addition using sigmoid / tangents and multiplication.
If both inputs and outputs have the same range, say, from -1 to +1, then the term offset will probably not be useful.
You can look at weighting the offset of the node in the experiment you mentioned. Or it is very low, and this probably means that the inputs and outputs are already centered. Or it is important, and I would argue that the dispersion of other weights is reduced, which leads to a more stable (and less prone to retraining) neural network.
source share