What you are trying to do is a very difficult task. At least if you want this to work with arbitrary forms.
This is for a simple reason: the computer does not work like a human brain. For example, take a look at this upper right picture. What do you see? Box? Or is it a flat rectangular shape with two parallelograms attached to the top and left?
Our brain sees a three-dimensional shape, because we live in a three-dimensional world, and most of the things that we see are three-dimensional, and our evolution has led to the formation of neural structures that are easily based on the perception of such.
But there is an even more fundamental problem: image segmentation. You need to separate parts of the image to adjacent areas. In our recognition, brain forms, reconstructions, and segmentations are interconnected, and this happens in an iterative process. You have probably experienced this several times: you saw a figure, but at first you could not understand what it was. And your mind raced with a huge number of objects and forms, which may be what you see. And then after a few minutes you clearly see the form. But not because you finally understood this only from the picture, but because the brain increased the sensory contribution with its existing knowledge of the world.
The task you are asking for affects not only computer vision, but also machine learning and pattern recognition.
source share