Vehicle Segmentation and Tracking

Question

Vehicle Segmentation and Tracking

I have been working on a project for some time to detect and track (move) vehicles in a video taken from a UAV, I currently use SVM trained in bag representations of local functions extracted from a vehicle and background images. Then I use the sliding window detection method to try to locate vehicles in the images that I would like to track. The problem is that this approach is very slow, and my detector is not as reliable as I would like, so I get quite a lot of false positives.

So, I am considering trying to segment cars from the background in order to find an approximate position in order to reduce the search space before applying my classifier, but I'm not sure how to do this, and was hoping someone could help?

Also, I read about segmentation of motion with layers, using an optical stream to segment a model with a frame by stream, does anyone have any experience with this method, if you could offer some input, as if you thought that this method will be applicable for my problem.

Below are two frames from the sample video.

frame 0:

frame 5:

+4

opencv object-detection classification image-segmentation tracking

Jono brogan Mar 13 '13 at 17:25

source share

2 answers

If the number of cars in your field of view always remains the same, but move around, you can use the optical flow ... this will give you good results against the background ... if the number of cars changes, then you need to call the goodFeaturestoTrack function in OpenCV after a certain number of frames and track cars again using optical flow.
You can use background modeling to simulate the background and therefore cars are always your foreground. The simplest example is frame differentiation ... subtract the previous frame of the current frame. diff(x,y,k) = I(x,y,k) - I(x,y,k-1) = I (x, y, k) - I (x, y, k- diff(x,y,k) = I(x,y,k) - I(x,y,k-1) . As your cars move in every frame, you get your position.
Both processes will work fine, since you have the prerequisites that I assume. check this link to find what the optical stream can do.

+1

rotating_image Mar 14 '13 at 1:49

source share

DCS · Accepted Answer · 2013-03-13T18:02:47+0000

Suppose your cars are moving, you can try to evaluate the ground plane (road).

You can get an estimate of the ground descent plan by extracting functions (SURF, not SIFT, for speed), matching them over pairs of pairs and solving for homography using RANSAC, since the plane in 3d moves in accordance with the homography between the two camera frames.

Once you have a ground plane, you can identify cars by looking at clusters of pixels that do not move according to the estimated homography.

A more sophisticated approach would be to make Structure from Motion on the ground. It only assumes that it is stiff, and not that it is flat.

Update

I was wondering, can you talk about how you are going to look for clusters of pixels that do not move according to the homography score?

Sure. Say, I and K are two video frames, and H are the homography mapping functions in I for functions in K First, you deform I to K according to H , that is, you compute the distorted image Iw as Iw( [xy]' )=I( inv(H)[xy]' ) (roughly the Matlab notation). Then you look at the square or absolute image of the difference Diff=(Iw-K)*(Iw-K) . The content of the image, which moves in accordance with the homography H , should give slight differences (subject to constant illumination and exposure between images). An image that violates H , for example, moving vehicles, should be allocated.

For clustering groups of pixels with a high level of errors in Diff I would start with a simple threshold value ("every pixel difference in Diff greater than X matters", possibly using an adaptive threshold). The threshold image can be cleaned by morphological operations (dilatation, erosion) and grouped with connected components. This may be too simplistic, but it is easy to implement for the first attempt, and it should be fast. For something a more bizarre look at Clustering on Wikipedia . An interesting may be the 2D Gaussian model of the mixture ; when you initialize it with the discovery result from the previous frame, it should be pretty fast.

I experimented a bit with the two frames that you provided, and I have to say that I'm a little surprised how well this works. :-) Left image: The difference (color coding) between the two frames that you posted. Correct image: Difference between frames after comparing them with homography. The rest of the differences are clearly moving cars, and they are strong enough for a simple threshold.

Thinking about the approach you are currently using, maybe this is interesting, combining it with my suggestion:

You can try to study and classify cars in difference image D instead of the original image. This will mean learning how a moving car looks, and not what a car that can be more reliable looks like.
You can get rid of the expensive window search and run the classifier only in areas D with a sufficiently high value.

Some additional notes:

In theory, cars should even stand out if they don’t move, as they are not flat, but given the distance to the scene and the resolution of the camera, this effect may be too subtle.
You can replace the extraction / reconciliation part of the function of my proposal with Optical Flow , if you wish. This boils down to identifying flow vectors that stick out from the sequential motion of the earth toward the frame. However, it may be subject to emissions in the optical stream. You can also try to get homography from flow vectors.
This is important: No matter what method you use, once you find the cars in one frame, you should use this information to increase your search for these cars in sequential mode, which gives a higher probability of detection close to the old ones (Kalman filter and etc.). What is tracking!

Vehicle Segmentation and Tracking

More articles: