Real-Time Tracking
Mean-Shift
Introduction: The subject
of real-time tracking is of particular intereset to the vision community
and has practical applications from surveilance to sporting events to more
experimental applications such as the notion of 'smart rooms' and increasingly
meaningful human-computer interaction. Mean-Shift tracking is a real-time
algorithm that endeavors to maximize the correlation between two statistical
distributions. The correlation, or similarity between two distributions
is expressed as a measurement derived from the Bhattacharyya coefficient.
Statistical distributions can be built using any characteristic discriminating
to a particular object of interest. A general model might use color,
or texture or include both. A very simple form of the image gradient,
i.e.,
pixel intensity differences between nearest neighbors in both the x
and
y
directions
will be used as the random variables in the tracking distributions implemented
here.
Part I
Scale Edge Hierarchy
The efficacy of this algorithm will rely on how representative the choice
of random variables discriminates an object. Since only derivative
information will be used, this particular Mean-Shift implementation will
be in the business of matching structure, in particular, edge-like structure.
Edges are neat little discontinuities that maintain over different scales
(particularly, silhouette edges). This property can be utilized to
increase the discriminating power of this edge-based matching algorithm.
Below is a pyramid showing images and their x and y derivative-maps
at three different scales, for each of the two sequences. Note that
silhouette edges are preserved, and some interior reflectance edges are
lost. However, this multiscale edge information will not be used
in this experiment. The loss of the ability to use this distinct
silhouette information will have profound effects on the algorithm.
Most notably, the tracker doesn't concern itself with this distinct 'outline'
and a window that encompasses Robby performs worse than a smaller interior
window that is concerned only with maintaining that interior distribution
(near homogenous, as it turns out).
Sequence One: Pyramid
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Sequence Two: Pyramid
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
The two pyramids above show a scale heirarchy and the corresponding
'difference-maps'. For each particular scale the difference
in the x-direction is on the left and the difference in the y-direction
is on the right. One rather frightening thing about these 'difference-maps'
is that Robby appears less salient than in the greyscale image. This
is purely a subjective statement of percept, and hopefully the actual numbers
involved are more discriminating than they 'appear'.
Part II
Learning the Distributions
The method to build the statistical distribution is given in the Comaniciu,
Ramesh, and Meer paper. The model is nothing more than a weighted
histogram of quantized difference values in the x and y directions.
The weighting is given by some kernel function that fits the needs of the
designer. The Epanechnikov kernel is used for this implementation.
The paper seemed to suggest that the window dimensions were the same as
h.
However, in the case of this implemention, h
is half of the
window dimensions. This actually inforces the distribution window
to be elliptical rather than rectangular. Figure 1 and Figure 2 show
the two rectangular regions that will be shown during tracking paired with
the actual shape of the distribution that survives the Epanechnikov kernel
with a non-zero value.
![]() |
![]() |
![]() |
![]() |
Depicted below is a series of figures depicting the structure distributions
for various frames of Robby. Included also are several non-Robby
examples. It is instructive to see exactly what the algorthm is up
against by noting the difference between Robby and non-Robby distributions.
Each plot is separated into four separate pieces of information.
The first simply illustrates the region of the image that the distribution
is derived from (pun semi-intentional), the second is the weighted histogram,
with the quantized difference in the x and y directions as
the index. Note, I scaled the difference information between 0 and
255/(quantization value), so a zero difference between two pixels
will accumulate in the middle of the distribution. The difference
information does form stalagmites around zero-difference. The
third and fourth plots within each figure show the marginal probability
distributions of the x-difference and y-difference respectively.
Quantization Value Eight:
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Quantization Value 4:
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Quantization Value 16:
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
A quantization value of eight gives 32 bins in either direction and a Bhattacharyya coefficent based on a correlation of two 1024 element distributions. The above figures show representative distributions from both Robby and non-Robby regions using quantization values of 4, 8 and 16. My trials were run using a quantization value of 8. The choice was based mainly on a the fact that it seemed a happy medium between Q16 which clearly lacks information and Q4 which I feared would be a bit too picky. All tests were run using Q8.
It becomes increasingly worrisome from the plots that perhaps edge information is not the most useful piece of information in regards to tracking Robby. In particular, Figure 6 shows the distribution of the grill of the planet cruiser. The horizontal bars give a very distinct texture as the texture repeats and signficant and meaningful tails appear in the distribution. In this case, edge-information is useful. However, the distributions corresponding to Robby regions are very symmetric, and peak near zero-difference with little or no tail. Curiously, a lot of distributions around Robby have similar characteristics.
Part III
Tracking
Below are a series of tracking sequences using the algorithm developed
in the above mentioned paper. The first image shows where the box
was selected for the target model distribution. The table then proceeds
through the sequence. Tracking results are depicted at roughly every
10th frame.
Sequence One
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Sequence Two
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Not Terrible. However, it runs into some difficulties. The
most notable is that Robby doesn't so much translate, but rather rotates
and scales. By about the 65th frame in sequence one, Robby has turned
90 degrees and the distribution is completely different. The second
sequence doesn't suffer as much from this problem as Robby doesn't rotate
very much but he scales a lot. A possible solution to this problem
is to update the target model distribution occasionally to account for
new information that might arise from dis/occlusion or rotation.
Updating every N frames seems a bit arbitrary when there might be a more
data driven approach. Since the objective of the Mean Shift algorithm
is to maximize the Bhattacharyya coefficient, this is a meaningful measurement
of how close our target and candidate distributions actually are to one
another. I modified the algorithm so that if the Bhattacharyya coefficient
is below a certain similarity threshold then the target distribution
is updated once the current frame converges. After watching the Bhattarcharyya
coefficient through the first two sequences I chose an empirical similarity
threshold of .9. Unity is the coefficient for identical distributions,
and zero indicates no correlation. This relatively high threshold
guards against updating the distribution on a 'bad' track. I tracked
both sequences again using this modified algorithm. The results are
shown below.
Sequence One: With Updated Distributions
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Sequence Two: With Updated Distributions
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
By watching each frame go by, I think this is a big improvement over
the static target distribution. However, the results objectively
look only moderately better. Two problems immediately present themselves.
First, how to best penalize a region that wants to wander off of the image.
In these sequences I don't penalize it at all which accounts for its eventual
desire to wander off to the Forbidden Planet's Underworld. The next
problem has to do with the fact that Robby scales significantly during
the course of the sequence. Using a static window forces one to either
initialize a 'baggy' window at the beginning and have it tight at the end
(which doesn't work because 'baggy' windows are inclined to drift), or
begin with a tight window and have a rather ineffectual representation
of where Robby actually is in the final sequences. The problem with
the small window on the big Robby is that there are multiple positions
within Robby that serve the same Bhattacharyya coefficient and the window
tends to wander around in the interior of Robby and only signifies the
notion that part of Robby is within the window. A possible solution
to this problem is to enable a variable window, i.e., dynamically
modify h. As suggested in the paper, for the next set
of series, I calculate three Bhattacharyya coefficents: the first
with the current window, the second with a window reduced by 10%, and a
third with a window inflated by 10%. The window that wins the day
is that with the highest coefficient, and h and the window
are updated accordingly. This allows the window to take on any scale
at increments of 10% per frame. The first series below, which corresponds
to sequence two, uses this variable window but does NOT update
the distribution (i.e., set the similarity threshold to zero).
The second and third series use both the variable window and the distribution
update (with a similarity threshold of .9). The dynamic window could
also be implemented by letting hx and hy vary
independantly. However, the combinatorics reduce the
algorithm to less than real-time in this case.
Sequence Two: With Variable Window
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Sequence One: With Variable Window and Updated Distributions
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Sequence One: With Variable Window and Updated Distributions
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
The results for the first series (sequence two) are encouraging.
The window scales itself in a meaningful fashion and the tracking results
are much better. The second and third series however have trouble.
Around the 65th frame, Robby has almost completed his turn and the tracking
consistently wants to wander off the backside of Robby. Looking
at Figure 22 and Figure 26 its not suprising as the distribution in that
region looks very Robby-like. In the first several runs the tracking
managed to avoid this because the window region was so long that the grill
of the cruiser prevented it from drifting in that direction. My bag
of tricks is empty. I tried a number of things in order to try and
coax it to track this series using a variable window and a distribution
update because I think that both are the correct way to do things, but
I couldn't get satisfactory results using both modifications with the first
sequence.
Conclusions
It turns out that the algorithm is simple and reasonably effective.
One can imagine that color information is probably more discriminating
than difference information, and I believe this to be the major drawback
in this implementation. Robby's difference information isn't distinctive
enough to enable perfect tracking. Many regions contain difference
maps similar to Robby. Another drawback is that spatial information
is lost, allowing very different 'looking' regions to appear similar to
the tracking algorithm.
Code:
In order of importance:
trackRobby()
trackRobby2()
MeanShift()
bhattarcharyya()
learnDistribution()
defineModel()
epanech()
reduceGrad()