While usually people tend to just resize any image per square during CNN training (for example, resnet takes a 224x224 square image), which looks ugly to me, especially when the aspect ratio does not exceed 1.
(In fact, this can change the truth, for example, a shortcut that an expert can give a distorted image may differ from the original one).
So now I resize the image, say, 224x160, keeping the original ratio, and then I overlay the image at 0 (paste it in a random place in the completely black image 224x224).
My approach does not seem original to me, and yet I cannot find any information about my approach and the βusualβ approach. Funky!
So which approach is better? What for? (if the answer depends on the data, please share your thoughts on when one is, if preferred by the other.)
source share