Nonsmooth and non-differentiable loss tensor function

Question

Nonsmooth and non-differentiable loss tensor function

In a tensor flow, can you use a nonsmooth function as a loss function, for example, piecewise (or with if-else)? If you can not, why can you use ReLU?
In this SLIM link , it says

"For example, we might want to minimize journal losses, but our interest metrics could be an F1 rating, or Union Crossing rating (which are not differentiable, and therefore cannot be used as losses)."

Does this mean “not differentiable” at all, for example, problems with the set? Because for ReLU at point 0 it is not differentiable.

If you use such a custom loss function, do you need to implement the gradient yourself? Or can tenorflow do this automatically? I checked some custom loss functions, they did not implement the gradient for their loss function.

+11

tensorflow

user2863356 Nov 22 '16 at 22:28

source share

3 answers

Yahia Zakaria · Answer 1 · 2016-11-23T07:10:46+0000

The problem is not that the loss is piecewise or non-smooth. The problem is that we need a loss function that can send a non-zero gradient back to the network parameters (dloss / dparameter) when there is an error between the output and the expected output. This applies to almost any function used within the model (for example, loss function, activation function, attention function).

, Perceptron H (x) (H (x) = 1, x > 0 else 0). H (x) (undefined x = 0). , , ( ), . , , , sigmoid ( x).

Relu 1 x > 0 0 . undefined x = 0, - x > 0. .

, . , F1, ( undefined x), , , -, L2 L1 , . (, " " L1 x = 0, )

, , (, ).

braindead · Answer 2 · 2017-05-18T17:00:07+0000

№ 3 OP, . Tensorflow , , !

Theron · Answer 3 · 2019-06-15T07:40:02+0000

tf , - . . , , .
, , , -/ . - MATLAB. () .

function [s, ds] = QPWC_Neuron(z, sharp)
% A special case of (quadraple) piece-wise constant neuron composing of three Sigmoid functions
% There are three thresholds (junctures), 0.25, 0.5, and 0.75, respectively
% sharp determines how steep steps are between two junctures.
% The closer a point to one of junctures, the smaller its gradient will become. Gradients at junctures are zero.
% It deals with 1D signal only are present, and it must be preceded by another activation function, the output from which falls within [0, 1]
% Example:
% z = 0:0.001:1;
% sharp = 100;

LZ = length(z);
s = zeros(size(z));
ds = s;
for l = 1:LZ
    if z(l) <= 0
        s(l) = 0;
        ds(l) = 0;
    elseif (zl > 0) && (z(l) <= 0.25)
        s(l) = 0.25 ./ (1+exp(-sharp*((z(l)-0.125)./0.25)));
        ds(l) = sharp/0.25 * (s(l)-0) * (1-(s(l)-0)/0.25);
    elseif (z(l) > 0.25) && (z(l) <= 0.5)
        s(l) = 0.25 ./ (1+exp(-sharp*((z(l)-0.375)./0.25))) + 0.25;
        ds(l) = sharp/0.25 * (s(l)-0.25) * (1-(s(l)-0.25)/0.25);
    elseif (z(l) > 0.5) && (z(l) <= 0.75)
        s(l) = 0.25 ./ (1+exp(-sharp*((z(l)-0.625)./0.25))) + 0.5;
        ds(l) = sharp/0.25 * (s(l)-0.5) * (1-(s(l)-0.5)/0.25);
    elseif (z(l) > 0.75) && (z(l) < 1)
        % If z is larger than 0.75, the gradient shall be descended to it faster than other cases
        s(l) = 0.5 ./ (1+exp(-sharp*((z(l)-1)./0.5))) + 0.75;
        ds(l) = sharp/0.5 * (s(l)-0.75) * (1-(s(l)-0.75)/0.5);
    else
        s(l) = 1;
        ds(l) = 0;
    end
end
figure;
subplot 121, plot(z, s); xlim([0, 1]);grid on;
subplot 122, plot(z, ds); xlim([0, 1]);grid on;

end

Python tf, @papaouf_ai. Python Tensorflow?

Nonsmooth and non-differentiable loss tensor function

More articles: