If I understand correctly, you have two pictures taken from the smartphoneβs camera, for which you know (at least approximately) the matrix of built-in functions and the relative three-dimensional rotation between the poses in which two images are taken. You also say that there is a small translation between the two images, which is good, since you could not calculate the depth otherwise.
Unfortunately, you do not have enough information to be able to directly assess the depth. In principle, to assess the depth of two images requires:
1. Find matching dots between two images
Depending on what you want to do, this can be done either for all image points (i.e. in a dense way), or only for a few points (i.e. in a sparse order). Of course, the last one is less expensive, so it is more suitable for smartphones.
To compare the density, you need to correct the images to make the calculations acceptable, however it will probably take a lot of time if it is done on a smartphone. Image straightening can be achieved either using a calibrated method (which requires knowing the rotation + translation between two image positions, the matrix of the internal matrix and camera distortion coefficients) or an uncalibrated method (which requires knowing the sparse coincidences of points between two images and the main matrix, which can be ranked out of matches).
Rare comparisons require matching key features (such as SURF or SIFT or more effective) between two images. The advantage of this is that it is more efficient than a close match, and also more accurate.
2. Triangulate the corresponding points to estimate the depth
Triangulation requires knowing the internal parameters (camera matrix and distortion coefficients) and the parameters of external parameters (relative rotation and translation between the poses in which the images are made).
In your case, if your relative rotation matrix and the inner matrix are accurate enough (which I doubt), you still lack translation coefficients and distortion coefficient.
However, you can still use the classic stereo triangulation approach, which requires accurate calibration of your camera and evaluation of the full relative position (i.e. rotation + translation).
Calibrating your camera will allow you to evaluate the exact internal matrix and the corresponding distortion factors. This is recommended because your camera will not be exactly the same as the cameras on other phones (even if it is the same phone model). See this tutorial , which shows the methodology, although the code samples are in C ++ (an equivalent must exist for android).
Once you have accurately estimated the parameters of the internal parameters, one way to evaluate the full relative position (i.e. rotation and translation) is to calculate the main matrix (using coincidence of signs between the two images), then derive the essential matrix using the camera matrix and finally, decompose the essential matrix into relative rotation and translation. See this link for a formula for deriving the essential matrix from the fundamental matrix and this link for how to calculate rotation and translation from the main matrix.
Also, in order to answer your other question related to warpPerspective
, you will need to use KRinv(K)
or K.inv(R).inv(K)
, depending on the image you are warping. This is because R
is a three-dimensional rotation that has nothing to do with pixel coordinates.