Say I have a set of 5 markers. I am trying to find the relative distances between each marker using an augmented reality infrastructure such as ARToolkit . In my camera, feed your first 20 frames, show me the first 2 markers so that I can develop a transformation between the two markers. The second 20 frames show me only the 2nd and 3rd markers and so on. The last 20 frames show me the 5th and 1st markers. I want to create a three-dimensional map of marker positions of all 5 markers.
My question is: knowing that there will be inaccuracies with distances due to the poor quality of the video stream, how to minimize inaccuracies, given all the information collected?
My naive approach would be to use the first marker as a base point, take the average value of the transformations from the first 20 frames and put the 2nd marker, etc. for the 3rd and 4th. For the 5th marker, place it between the 4th and 1st, placing it in the middle of the average number of conversions between the 5th and 1st, 4th and 5th. This approach, which I feel, is biased towards the first placement of the marker, although it does not take into account that the camera sees more than two markers per frame.
Ultimately, I want my system to be able to map x the number of markers. Markers up to x may appear in any given frame, and non-system errors appear due to image quality.
Any help regarding the correct approach to this problem is appreciated.
Edit: Additional information about the problem:
Let's say the realworld map looks like this:

Let's say I get 100 readings for each of the transformations between the points represented by the arrows in the image. Actual values ββare written above the arrows.
The values ββI get have some error (it is assumed that it corresponds to a Gaussian distribution relative to the actual value). For example, one of the readings obtained for marker 1 to 2 could be x: 9.8 y: 0.09. Given that I have all these readings, how can I evaluate the card. Ideally, the result should be as close as possible to the real values.
My naive approach has the following problem. If the average conversion value from 1 to 2 is slightly behind the placement of 3, it can be turned off, although the readings from 2 to 3 are very accurate. This issue is shown below:

Greens are actual values, blacks are calculated values. The average conversion from 1 to 2 is x: 10 y: 2.