This means that we are not talking about pixels in the image, but about a moving object, which makes the task more convenient. Your data is indeed a time series, so time-based algorithms are preferred. Their classic are Markov models (in particular , Markov chains and somewhat more complex hidden Markov models ).
However, your input is noisy due to camera instability. Thus, even a better solution would be to use a Kalman filter - a model similar to HMM, but with a clear understanding of noise. It is widely used in robotics, navigation, and similar fields to estimate the current and predict the future location of a vehicle based on inaccurate sensor data and historical information. Doesn't that look like what you need?
I don't really like Matlab, but it seems to have a kalman function that implements the mentioned filter.
source share