Probably the best solution for a coin problem would be to use regression to solve this problem. Annotate 5k images with the number of objects in the scene and run your model on it. Then your model only displays the correct number. (Hope)
Another way is to classify whether the image shows a coin and uses a sliding window approach similar to this: https://arxiv.org/pdf/1312.6229.pdf to classify each window if it shows a coin. Then you count the found areas. It is easier to comment, learn and expand better. But you have the problem of choosing good windows and using the result of these windows in a compressed form.
source share