Department of Computer Science
University of Otago
Dunedin, New Zealand
Feature tracking is one of the most fundamental operations in computer vision - it is probably the most popular way of extracting motion information from an image sequence (its closest competitor for the title would probably be optical flow). Despite this, existing feature tracking techniques are relatively crude, falling into two camps: correspondence based techniques, and texture correlation based techniques. Correspondence-based techniques extract a set of features from each frame (typically corner-like features), and then attempt to establish correspondences between both sets of features (see ASSET-2 for an example of a system that uses this technique ). These techniques require that the same feature can be detected reliably and consistently across many frames. A big disadvantage of this technique is that correspondence errors tend to be very large.
Texture correlation-based techniques extract a set of features from the first frame only. The position of these features in subsequent frames is found by doing a global search inside a suitable sized window for the position which correlates best with the texture around the feature in the first frame. The disadvantage of this technique is that features tend to drift. They also do not cope well when the texture in the subsequent frame has been rotated, zoomed or skewed with respect to the texture in the first frame (this occurs to some degree for almost all types of object motion), although there have been attempts to solve this problem [2,3].
We propose two mechanisms for overcoming these problems: a novel relaxation-based
tracker, and a data-fusion based system architecture which combines the
results of multiple tracking units. The operation of the relaxation tracker
is described below:
For a given feature in frame #1, we predict its position in frame #2 using a Kalman filter and a linear motion model. This prediction is then refined by `relaxing' it on an energy surface. This energy surface is a function of how `feature-like' a pixel is (this is why the local maxima in the energy surface above are in the same position as corners in frame #2), and how much the pixel looks like the original feature (this is why the top left local maximum is much stronger than the others). The relaxation procedure itself is just a simple hill-climbing procedure. Our results show that this simple technique performs significantly better than correspondence-based techniques when feature movements are small, but much worse when the movements are large (the threshold is around 5 pixels/frame).
Since the relaxation and correspondence techniques perform better in
different situations, it makes sense to try to combine them:
We have developed a simple architecture for combining the predictions of multiple trackers. For the best prediction of each tracker we calculate a number of attributes. These attributes quantify the tracker's confidence in the prediction, the uniqueness of the prediction, to what degree the prediction violates the one-to-one correspondence constraint, and how similar the prediction is to the predictions of the other trackers. These attributes are used to decide whether or not the prediction is correct. This is done by classifying the resulting 4-dimensional vector using a simple machine learning technique. The `correct' predictions are then averaged to preoduce the final prediction for the whole system.
This system has less than half the dropout and error rate of a conventional
correspondence-based tracker. The major disadvantage of our system
is the need to train the classification system, however we have demonstrated
that the system can generalise from generated synthetic data. For more
details you download the paper
or you can download the C++ source code
The code compiles under Red Hat Linux 5.2 and requires X. Note that you
will have to train the corner tracker on sequences with a known motion
field before it will work well - this may be impractical for some applications.
 S.M. Smith and J.M. Brady. Asset-2: Real-time motion segmentation and shape tracking. Transactions of the IEEE on Pattern Matching and Machine Intelligence, vol. 17, number 8, pp 814-820, 1995.
 J. Shi and C. Tomasi. Good Features to track. IEEE Conference on Computer Vision and Pattern Recognition (CVPR'94), pp 593-600, June 1994.
 S.B. Kang, R. Szeliski, and H.Y. Shum. A parallel feature tracker for extended image sequences. Computer Vision and Image Understanding, vol 67, number 3, pp 296-310, September 1997.
Maintained by Brendan McCaneLast Modified: 31st August 2000