Future Prospects for Improving Depth Data on Project Tango Tablets

I am interested in using Project Tango tablet for 3D reconstruction using arbitrary point functions. In the current version of the SDK, we seem to have access to the following data.

  • Image 1280 x 720 RGB.
  • Point cloud with 0 ~ 10,000 points, depending on the environment. In most cases, this usually ranges from 3,000 to 6,000.

What I really want is the ability to define a three-dimensional point for the key points of the image. Therefore, it makes sense to project depth into the image plane. I did this and I got something like this:

enter image description here

The problem with this process is that the depth points are scarce compared to RGB pixels. So I took another step and interpolated between the depth points. First, I did the Delaunay triangulation, and as soon as I got a good triangulation, I interpolated between the three points on each face and got a decent, fairly uniform image of depth. Here are the areas where the interpolated depth is valid, superimposed on the RGB yamg.

enter image description here

Now, given the camera model, you can project the depth back to the Cartesian coordinates at any point in the depth image (since the depth image was made so that each pixel matches the point of the original RGB image, and we have the camera settings for the RGB camera). However, if you look at the image of the triangulation and compare it with the original RGB image, you will see that the depth is valid for all uninteresting points of the image: empty, faceless planes in the main. This is not true for this single set of images; this is the trend that I see for the sensor. For example, if a person is standing in front of the sensor, there are very few depth points in their silhouette.

As a result of this characteristic of the sensor, if I perform visual visualization of the image, most areas with angles or interesting textures fall into areas without corresponding depth information. Just an example: I found 1000 SIFT key points from an RGB image from an Xtion sensor, and 960 of them had valid depth values. If I do the same with this system, I get about 80 key points with a valid depth. At the moment, this level of performance is not acceptable for my purposes.

I can guess the main reasons for this: it seems that some kind of plane extraction algorithm is used to get depth points, while Primesense / DepthSense sensors use something more complicated.

So, anyway, my main question here is: can we expect any improvement in depth data at a later time, with the help of improved RGB-IR image processing algorithms? Or is it an integral limit of the current sensor?

+5
source share
2 answers

I am from the Project Tango team at Google. I'm sorry that you are having problems with depth on the device. Just so that we are sure that your device is in good working condition, you can check the depth on a flat wall. Instructions are given below: https://developers.google.com/project-tango/hardware/depth-test

Even with the device in good working condition, the depth library, as you know, returns sparse depth points in scenes with low IR reflection objects, small objects, scenes with a high dynamic range, surfaces at certain angles and objects at distances greater than ~ 4th. Although some of these are inherent limitations of a depth solution, we work with a depth solution provider to bring improvements where possible.

Attached is a typical scene of a conference room and a corresponding point cloud. As you can see, 1) no depth points are returned from the laptop screen (with a low reflection coefficient), such as table objects such as post-its, pencil holder, etc. (Small size of the object), large parts of the table (surface at angles), the corner of the room in the far right corner (distance> 4 m).

But when you move around the device, you will begin to receive data on points of depth. The accumulation of depth points is required to obtain denser point clouds.

Please also keep us informed of your results on project-tango-hardware-support@google.com

image

+7
source

In my most basic initial experiments, you are right about the information about the depth returned from the field of view, however the return of surface points is not something constant. I find that when moving the device I can get big shifts in where the depth information returns, i.e. There is a lot of transient opacity of the image with respect to depth data, probably due to surface characteristics. Thus, as long as the return frame is not sufficient, the real question seems to be building a larger model (a point cloud that can be opened, possibly voxel spaces, as one scales) to bring sequential scans to the overall model. It resembles synthetic aperture algorithms in the spirit, but the letters in the equations come from a whole different set of laws. In short, I think a more interesting approach is to synthesize a more complete model by sequentially accumulating cloud cloud data - now, for this to work, the device team must have its dead money for any scale that is being done. In addition, this fixes a problem that no sensor enhancement can address - if your visual sensor is perfect, it still does nothing to help you connect the sides of the object, at least in the immediate vicinity of the front of the object.

+1
source

Source: https://habr.com/ru/post/1209903/


All Articles