Is node offset required in very large neural networks?

I understand the role of node offset in neural networks and why it is important to switch the activation function in small networks. My question is: is bias still important in very large networks (more precisely, a convolutional neural network for image recognition using the ReLu activation function, 3 convolutional layers, 2 hidden layers and more than 100,000 connections), or does it affect the lost due to the large amount of activation?

The reason I am asking for is because in the past I had built networks in which I forgot to implement the node offset, however when adding one of them I saw a slight difference in performance. Maybe this was an accident, because the specified data set did not require an offset? Do I need to initialize an offset with a large value in large networks? Any other recommendations would be greatly appreciated.

+6
source share
3 answers

Node / term bias exists only to ensure that the predicted result is unbiased. If your input has a dynamic range (range) that goes from -1 to +1, and your output is just translating the input to +3, a neural network with a bias condition will just have a neural bias system with a non-zero weight and the rest will be zero . If in this situation you do not have a neural bias, all the activation and weighing functions will be optimized in such a way as to simulate at best a simple addition using sigmoid / tangents and multiplication.

If both inputs and outputs have the same range, say, from -1 to +1, then the term offset will probably not be useful.

You can look at weighting the offset of the node in the experiment you mentioned. Or it is very low, and this probably means that the inputs and outputs are already centered. Or it is important, and I would argue that the dispersion of other weights is reduced, which leads to a more stable (and less prone to retraining) neural network.

+5
source

Offset is equivalent to adding a constant of 1 to the input of each layer. Then the weight of this constant is equivalent to your offset. It is very simple to add.

Theoretically, this is not necessary, since the network can "learn" to create its own node offset at each level. One of the neurons can set the weight very high, therefore it is always 1 or equal to 0, therefore it always gives a constant 0.5 (for sigmoid units). However, this requires at least 2 layers.

+3
source

Why is bias required in a neural network:

The bias of a node in a neural network is a node that is always on . That is, its value is 1 without taking into account the data in this template. It is similar to interception in the regression model and performs the same function.

If the neural network does not have a node offset in this layer, it will be unable to output to the next layer, which differs from 0 when the function value is 0

That is why we demanded bias values ​​in the neural network.

0
source

Source: https://habr.com/ru/post/975145/


All Articles