Object Tracking Techniques
- Updated2026-02-18
- 5 minute(s) read
NI Vision implements two object tracking algorithms:
To track an object, the target object must first be characterized over a feature space. The color histogram is a very robust representation of the object appearance, and is chosen as the feature space. Moving objects are characterized by their histograms. The feature-histogram-based target representations are regularized by spatial masking with an isotropic kernel.
Understanding Mean Shift
The mean shift algorithm is a is a simple method for finding the position of a local mode (local maximum) of a kernel-based estimate of a probability density function. Object tracking for an image frame is performed by a combination of histogram extraction, weight computation and derivation of new location.
There are three stages to the mean shift algorithm:
Understanding EM-Based Mean Shift
The mean shift algorithm is not scale or geometric-shift invariant. To track an object that may appear to change in size or shape, the EM-based mean shift algorithm is required.
The EM-based mean shift, or shape adapted mean shift, algorithm is an extension of the standard algorithm already described. The EM-based mean shift algorithm simultaneously estimates the position of the local mode and the covariance matrix that describes the approximate shape of the local mode. The covariance matrix that defines the shape and scale of the region (that defines the object) is updated every frame to adapt to the shape and scale of the object in that frame.
There are three stages to the mean shift algorithm:
Kalman Prediction
EM-based mean shift also features a Kalman Filter implementation. A Kalman filter uses the history of measurements of the target to build a model of the state of the system. The history of measurements is used to accurately predict the location of the target.
Histogram Back Projection
Back projection is one method used to improve the convergence of the target candidate's size and location with the actual size and location of an object. Back projection is a way of recording how well the pixels of a target candidate fit the distribution of pixels that the target models. This allows the user to gauge how well the model of the object matches its appearance.
A histogram of an image known to contain the object of interest is created, and is then back projected over the image. Proper thresholding of the resulting image should isolate the object from the background.
Each pixel value in the resulting image represents the likelihood that the pixel is part of the object. The minimum pixel value of 0 indicates the pixel does not belong to the object, while the maximum value of 255 verifies that the pixel belongs to the object. This back projected image is a good indication of how well the tracking algorithm has been able to identify the pixels that belong to the object to be tracked.
Background Subtraction
A second method used to improve the convergence of the target model is background subtraction. This method is a process that extracts foreground objects in a particular scene. This helps reduce false positives and creates a better match between the target model and the target candidates.
Choosing the Right Parameters
The following parameters can be set by the user to create an object tracking applications suited to their needs:
The following additional parameters can be used to configure the EM-based mean shift algorithm.