I wrote an object tracker that will try to detect and track a moving object in the recorded video. To maximize detection speed, my algorithm uses many detection and tracking algorithms (cascade, foreground and particle tracker). Each tracking algorithm returns a point of interest me, which may be part of the object I'm trying to track. Assume (for simplicity of this example) that my object is a rectangle and that three tracking algorithms return points 1 , 2 and 3 :

Depending on the ratio / distance of these three points, you can calculate the center of gravity (blue X in the image above) of the tracked object. Therefore, for each frame, I could come up with a good estimate of the center of gravity. However, an object can move from one frame to the next:

In this example, I just rotated the source object. My algorithm will give me three new points of interest: 1' , 2' and 3' . I could again calculate the center of gravity based on these three new points, but I would throw away the important information that I acquired from the previous frame: based on points 1 , 2 and 3 I already know something about the relationship of these points and, thus , by combining information from 1 , 2 and 3 and 1' , 2' and 3' I would have to get a more accurate estimate of the center of gravity.
In addition, the following frame can give a fourth data point:

This is what I would like to do (but I don't know how):
based on the individual points (and their relationship to each other) that are returned from different tracking algorithms, I want to create a localization map tracked object. Intuitively, I feel that I need to come up with A) an identification function that will identify individual points through frames and B) some kind of cost function that will determine how similar tracking points (and relations / distance between them) are located from frame to frame but I canโt figure out how to implement this. Alternatively, some type of point-based map building may occur. But then again, I donโt know how to approach this. Any advice (and sample code) is much appreciated!
EDIT1 a simple particle filter may also work, but again I don't know how to determine the cost function. The particle filter for tracking a specific color is easy to program: for each pixel you calculate the difference between the color of the target and the color of the pixel. But how would I do the same to assess the relationship between tracked points?
EDIT2 intuitively I feel that Kalman filters can also help with the prediction step. See Slides 24 to 32 of this pdf . Or am I misled?