R is the covariance matrix of the measuring noise, assumed to be Gaussian. In the context of tracking objects in a video, this means your detection error. Let's say you use a face detector to detect faces, and then you want to track them using the Kalman filter. You start the detector, you get a bounding box for each face, and then you use the Kalman filter to track the centroid of each window. The R-matrix should describe how uncertain you are about the location of the center of gravity. Therefore, in this case, for the x, y coordinates, the corresponding diagonal values โโof R should be several pixels. If your condition includes speed, then you need to guess the uncertainty of speed measurement and consider units. If your position is measured in pixels and your speed in pixels per frame, then the diagonal entries R should reflect this.
Q is the covariance of the process noise. Simply put, Q indicates how much the actual movement of the object deviates from your intended motion model. If you track cars on the road, then a constant speed model should be good enough, and Q records should be small. If you track people's faces, they are unlikely to move at a constant speed, so you need to scroll Q. Again, you need to know the units in which your state variables are expressed.
So this is intuition. In practice, you start with a reasonable initial assumption for R and Q, and then set them up experimentally. So installing R and Q is a bit of art. In addition, in most cases, the use of diagonal matrices for R and Q is sufficient.
Here is an example that uses vision.KalmanFilter in Matalb to track multiple people.
source share