The problem is not that the loss is piecewise or non-smooth. The problem is that we need a loss function that can send a non-zero gradient back to the network parameters (dloss / dparameter) when there is an error between the output and the expected output. This applies to almost any function used within the model (for example, loss function, activation function, attention function).
, Perceptron H (x) (H (x) = 1, x > 0 else 0). H (x) (undefined x = 0). , , ( ), . , , , sigmoid ( x).
Relu 1 x > 0 0 . undefined x = 0, - x > 0. .
, . , F1, ( undefined x), , , -, L2 L1 , . (, " " L1 x = 0, )
, , (, ).