I believe that what you need is called Projective Transformation . Below is a link to what you need.
Demonstration of calculating projective transformation using the correct mathematical layout on mathematical SE.
Although you can solve it manually and write it into your code ... I strongly recommend using a mathematical matrix library or even writing your own mathematical mathematical functions before resorting to manually calculating the equations, since you have to solve them symbolically turning it into code , and it will be very expansive and prone to miscalculation.
Here are some tips to help you clarify (by applying it to your problem):
-Your Matrix (source) is built from 4 points in the image of your camera (pixel arrangement).
- Your matrix B (destination) is built from your measurements in the real world.
- For quick recalibration, I suggest marking points on the ground to be able to quickly put the cube in 4 locations (and then get the changed pixel locations in the camera) without redoing.
-You will only have to perform steps 1-5 (once) during calibration, after which whenever you want to know the position of something, just get the coordinates in the image and follow them through step 6 and step 7.
-You want your calibration points to be as far from each other as possible (within reasonable limits, since at extreme distances in a situation of a converging moment you begin to quickly lose pixel density and, therefore, the accuracy of the source image). Make sure that no 3 points are colinear (just put, make your 4 points approximately square in the almost full range of your fov camera in the real world)
ps I apologize for not writing this here, but they have fantastic math editing and it looks cleaner!
Final steps to apply this method to this situation:
To perform this calibration, you will need to set the global home position (most likely, do it arbitrarily on the floor and measure the position of the camera relative to this point). From this position, you will need to measure the distance of your object from this position in the x and y coordinates on the floor. Although a denser-packed calibration kit will give you more errors, the easiest solution for this would be to just have a dimensional sheet (I think a piece of paper for the printer or a big board or something else). The reason this will be easier is because it will have built-in axes (i.e., both sides will be orthogonal and you will just use the four corners of the object and use the set distances in your calibration). EX: for a piece of paper your points will be (0,0), (0,8,5), (11,8,5), (11,0)
Thus, using these points and pixels that you get will create your transformation matrix, but it still just gives you a global x, y position on axes that can be difficult to measure (they may be distorted depending on how you measured / calibrated). Therefore, you will need to calculate the offset of your camera:
object in real worlds (from the steps above): x1, y1 cameras (Xc, Yc)
dist = sqrt (pow (x1-Xc, 2) + pow (y1-Yc, 2))
If it is too cumbersome to try to manually measure the position of the camera from a global source, you can instead measure the distance to 2 different points and transfer these values to the above equation to calculate the offset of your camera, which you will then save and use at any time. when you want to get the final distance.