I am implementing a neural network and wanted to use ReLU as a function of activating neurons. In addition, I train the network with SDG and backpropagation. I am testing a neural network using the paradigmatic XOR problem, and still it correctly classifies new patterns if I use a logic function or hyperbolic tangent as activation functions.
I read about the benefits of using Leaky ReLU as an activation function and implemented it in Python as follows:
def relu(data, epsilon=0.1): return np.maximum(epsilon * data, data)
where np is the name of NumPy . The related derivative is performed as follows:
def relu_prime(data, epsilon=0.1): if 1. * np.all(epsilon < data): return 1 return epsilon
Using this function as an activation, I get incorrect results. For instance:
Input = [0, 0] β Output = [0.43951457]
Input = [0, 1] β Output = [0.46252925]
Input = [1, 0] β Output = [0.34939594]
Input = [1, 1] β Output = [0.37241062]
You can see that the outputs are very different from the expected XOR. So the question would be, is there any particular consideration for using ReLU as an activation function?
Please do not hesitate to give me more context or code. Thanks in advance.
EDIT: There is an error in the derivative, as it returns only one float, not a NumPy array. The correct code should be:
def relu_prime(data, epsilon=0.1): gradients = 1. * (data > epsilon) gradients[gradients == 0] = epsilon return gradients
source share