Accurate measurement of the relative distance between a set of fiducials (Augmented Reality application)

Say I have a set of 5 markers. I am trying to find the relative distances between each marker using an augmented reality infrastructure such as ARToolkit . In my camera, feed your first 20 frames, show me the first 2 markers so that I can develop a transformation between the two markers. The second 20 frames show me only the 2nd and 3rd markers and so on. The last 20 frames show me the 5th and 1st markers. I want to create a three-dimensional map of marker positions of all 5 markers.

My question is: knowing that there will be inaccuracies with distances due to the poor quality of the video stream, how to minimize inaccuracies, given all the information collected?

My naive approach would be to use the first marker as a base point, take the average value of the transformations from the first 20 frames and put the 2nd marker, etc. for the 3rd and 4th. For the 5th marker, place it between the 4th and 1st, placing it in the middle of the average number of conversions between the 5th and 1st, 4th and 5th. This approach, which I feel, is biased towards the first placement of the marker, although it does not take into account that the camera sees more than two markers per frame.

Ultimately, I want my system to be able to map x the number of markers. Markers up to x may appear in any given frame, and non-system errors appear due to image quality.

Any help regarding the correct approach to this problem is appreciated.

Edit: Additional information about the problem:

Let's say the realworld map looks like this:

enter image description here

Let's say I get 100 readings for each of the transformations between the points represented by the arrows in the image. Actual values ​​are written above the arrows.

The values ​​I get have some error (it is assumed that it corresponds to a Gaussian distribution relative to the actual value). For example, one of the readings obtained for marker 1 to 2 could be x: 9.8 y: 0.09. Given that I have all these readings, how can I evaluate the card. Ideally, the result should be as close as possible to the real values.

My naive approach has the following problem. If the average conversion value from 1 to 2 is slightly behind the placement of 3, it can be turned off, although the readings from 2 to 3 are very accurate. This issue is shown below:

enter image description here

Greens are actual values, blacks are calculated values. The average conversion from 1 to 2 is x: 10 y: 2.

+6
source share
1 answer

You can use the least squares method to find the transformation that works best for all of your data. If all you need is the distance between the markers, this is just the average of the measured distances.

Assuming the positions of your marker are fixed (for example, on a fixed rigid body) and you want their relative position, you can simply record your positions and average them. If it is possible to confuse one marker with another, you can track them from frame to frame and use the continuity of each location of the marker between two periods to confirm its identity.

If you expect your rigid body to move (or if the body is not rigid, etc.), then your problem will be much more complicated. Two markers at a time is not enough to fix the position of a solid (which requires three). However, note that with each transition, you have the location of the old marker, the new marker, and the continuous marker at almost the same time. If you already have the expected body space for each of your markers, this should provide a good estimate of the stiff posture every 20 frames.

In general, if your body is moving, to achieve maximum performance, you will need some kind of model for its dynamics, which should be used to track your posture over time. Given a dynamic model, you can use the Kalman filter to track; Kalman filters are well adapted to integrate the data types that you describe.

By including the locations of your markers as part of the Kalman state vector, you may be able to derive their relative locations from purely sensory data (which appear to be your goal), instead of requiring this information a priori, if you want to process an arbitrary amount efficiently markers, you may need some clever mutation of conventional methods; your problem seems to be designed to avoid being solved by traditional decomposition methods such as Kalman sequential filtering.


Change according to the comments below:

If your markers give a full three-dimensional position (instead of a three-dimensional position), additional data makes it easy to maintain accurate information about the object that you are tracking. However, the above recommendations still apply:

  • If the labeled body is fixed, use the least squares label of all relevant frame data.
  • If the marked body moves, model its dynamics and use the Kalman filter.

New points that come to mind:

  • Trying to manage a chain of relative transformations may not be the best way to approach the problem; as you noticed, it is subject to accumulated error. However, this is not necessarily a bad way if you can implement the necessary math in this structure.
  • In particular, the least squares fit should work perfectly with the chain or ring of relative poses.
  • In either case, to comply with least squares techniques or to track the Kalman filter, a good estimate of the uncertainty of your measurements will improve performance.
0
source

Source: https://habr.com/ru/post/903069/


All Articles