I will try another way to explain it here. :)
Short answer: the unit of your Cartesian positions does not matter if you keep it homogeneous , i.e. while you apply this unit to both your scene and your camera .
For a longer answer, return to the formula used ...

WITH
d relative Cartesian coordinatess size of your printed surfacer size of your “touch” / recording surface (ie r_x and r_y size of the sensor and r_z its focal length)b position on your printed surface
.. and perform pseudomeric analysis. We have:
[PIXEL] = (([LENGTH] x [PIXEL]) / ([LENGTH] * [LENGTH])) * [LENGTH]
Regardless of what you use as a unit for LENGTH , it will be homogenized, i.e. only the proportion is maintained.
Example:
[PIXEL] = (([MilliM] x [PIXEL]) / ([MilliMeter] * [MilliMeter])) * [MilliMeter] = (([Meter/1000] x [PIXEL]) / ([Meter/1000] * [Meter/1000])) * [Meter/1000] = 1000 * 1000 / 1000 /1000 * (([Meter] x [PIXEL]) / ([Meter] * [Meter])) * [Meter] = (([Meter] x [PIXEL]) / ([Meter] * [Meter])) * [Meter]
Return to my explanations in another topic:

If we use these notations for the expression b_x :
b_x = (d_x * s_x) / (d_z * r_x) * r_z = (d_x * w) / (d_z * 2 * f * tan(α)) * f = (d_x * w) / (d_z * 2 * tan(α)) // with w in px
You use (d_x, d_y, d_z) = (X,Y,Z) or (d_x, d_y, d_z) = (1000*X,1000*Y,1000*Z) , the coefficient d_x / d_z will not change.
Now for the reasons underlying your problem, you should perhaps check to see if you are correctly applying the correct device to your camera’s position / distance to the scene. Also check your α or focal length block, depending on which one you are using.
If you think that a later proposal is most likely. It is easy to forget to also apply the right block to the characteristics of your camera.