How to compare 2D coordinates from the store image to the actual store shelves?

Question

How to compare 2D coordinates from the store image to the actual store shelves?

We need to build a workshop model in which we can relate the pixel coordinates (x, y) from the camera image to real objects in the three-dimensional space of the store. Camera images that will act as sources for creating such a model suffer from fisheye distortion. Consequently, straight lines actually appear as curves in camera images, and the walls do not seem to meet each other exactly at right angles.

We subdivide the region into polygons. Each polygon in the image refers to a specific region, such as a shelf, display area, statement counter, etc. When matching the pixels that fall into each polygon, we want to link them as belonging to the shelf corresponding to this area.

Any ideas how to do this?

Below is a sample image of a store with several polygons marked:

enter image description here

EDIT: We are not looking to find out the 3D coordinates, we just need to know which shelf any polygon is displayed on. Therefore, if the user clicks on the polygon, we can say that he clicked on which shelf.

We can control above for large polygons, as shown in the image, but the shelves from the camera can be as few pixels as possible, so we need some kind of probabilistic result telling if the user pressed at (x, y), what is the probability that he is trying to click on Shelf-A or what is the likelihood that he is trying to click on Shelf-B, etc.

~~Basically, what we are looking for is a probability function that will return the click probabilities of the nearest objects when a small polygon (or pixel) is clicked on a two-dimensional image.~~

EDIT2: One thing that is not obvious from the sample image is that the size of the polygon can be very small (just a few pixels), and the polygons, in turn, can be very close to each other.

In addition, the use case is that the customer in the store selects a product from one of the shelves. The user of the application clicked on a point in the image from which, in his opinion, the products would be obtained. Now, since the polygons are so small and so close, the user can only guess the exact pickup point, so we can only at best know that it can be any of 3-4 polygons close to the click point. So, the question is how to calculate the probabilities for these 3-4 polygons based on the click?

As shown here, the distance from the click from the center of the polygon and its region can be a parameter in calculating this probability, which is interesting to me if there is an algorithm for this.

+4

math image-processing geometry computer-vision computational-geometry

stressed_geek Mar 17 '11 at 12:38

source share

3 answers

Some comments

the fish eye can be corrected by applying some transformations to the image, see, for example, this page for some resources, including panotools
to get only 3D coordinates and images from one camera is not enough, additional information is needed
Marking the same point on two images of the same scene from different cameras can give you full 3D information (you need to know the position of each camera relative to each other)
if you are looking for tools for this, see https://superuser.com/questions/30053/is-there-any-free-open-source-software-that-converts-photos-to-3d-models

EDIT
After updating the question, assuming that a set of polygons already exists, and you want to eliminate user errors (or improve accuracy), you could

try to guess the desired click polygon by calculating the distance to the center of weight of the polygons close to click
use visual cues (launch the selected polygon and require a second click)
collect error statistics and some polygons require verification

0

Unreason Mar 17 '11 at 12:46

source share

What you need is a scrollable space like Z-Curce or Hilbert-Curve. The spatial filling curve divides the plane into smaller tiles and reduces the complexity of 2-dimensional measurements into a 1-dimensional measure so that each tile gets a new order. What may be of interest to your problem is that the Hilbert curve does not cross the plane in binary order, but uses a gray code, so that each tile is different from 1-bit from the other tiles. This makes it easy to decide whether the user clicked an object.

0

Bytemain Mar 17 '11 at 14:42

source share

sastanin · Accepted Answer · 2011-03-17T13:30:53+0000

We are not looking to find out the 3D coordinates, we just need to know which shelf any polygon is displayed on. Therefore, if the user clicks on the polygon, we can say that he clicked on which shelf.

I assume you have a mapping with the name polygon to shelf. For example, as a list of pairs (polygon, shelf). You can do this manually if the cameras are locked and do not move. Then your problem is only to which polygon the point belongs.

If you use OpenCV, you can use the PointPolygonTest function. Otherwise, you can write a similar function yourself. See, for example, Ray casting algorithm. Then scroll through the list until you find the polygon where the point is.

To further optimize the program, you can pre-calculate the extents of the polygons. Extents allow you to quickly say when the point is definitely not inside the polygon, and consider only the remaining polygons. But with as many polygons as you have in the image, I would not worry.

Basically, what we are looking for is a probability function that will return the click probabilities of nearby objects when a small polygon (or pixel) clicks on a two-dimensional image.

Just run the experiment, try clicking on one selected pixel, copy some statistics about where the operator actually clicks. Once you have this, it's easy to predict the number of clicks outside the facility and how far they can be disabled.

Without such an experiment with the exact same person, with the same conditions of use and the same pointing device that you are going to use, you cannot say exactly how many clicks there will be. I believe that many people are sniper clickers if the mouse is good and they see the image well. If they are forced to use a touch interface or some other pointing device, accuracy may be lower.

How to compare 2D coordinates from the store image to the actual store shelves?

More articles: