I want to start a project that uses a very basic form of recognition of optical music.
For those who understand the notes:. Unlike other OMR projects, the only information to be extracted is the order and height of each note in the panel. Square notes, half notes and whole notes need to be distinguished. Shorter notes can be thought of as quarter notes. Dots on notes can be ignored. Speaker marking is not important
For everyone: Strictly speaking, I need to find the places of each of the following ...

... in a pattern like this ... 
I have no experience with image processing, so it would be very helpful to understand a basic conceptual explanation of which method or set of methods is used to achieve this.
source share