Are we considering this scenario to monitor learning?
This is supervised by training when you have tags to optimize your model. Therefore, for most neural networks it is controlled.
However, you can also look at the full task. I think you do not have any truth for pairs of images and the “desired” similarity value that your model should output?
, , CNN ( ), ( ) 1000 . , . , , , .
- ..
-, , "" . , ()? , ?
, 3 ?



FaceNet " " (CBIR):