Theory Basis

The backbone of The Membrane method of tracking is based on the ideas of active contours, as well as some potential field techniques from robotics. Active contour models, such as those in the Snake algorithm, use intensity gradients from images to fit contours to underlying features of objects. Each snake-like contour has internal and external energy which it tries to minimize. External energy is the effect of gradients pulling the contours to a close fit of features. Minimizing internal energy keeps the contours tight and smooth.

In our approach to tracking we had hoped to track multiple subjects in real-time. Hence, calculation of gradients and minimization were very expensive. Additionally, we hoped to maintain the integrity of logical objects, rather than some features. Regardless of the changes in lighting or appearance of the subjects their logical models should remain intact.

The initial solution to this problem was to use a medium more conducive to distinguishing separate logical objects than intensity or color images. Since there exist algorithms that calculate disparity maps at framerate, such as SRI's Small Vision System, we chose to use this in our codebase. Our snakes would then be placed around (or within) the objects to be tracked and then would grow/shrink to fit the contour of the entire object in the disparity map.


The Algorithm Core

For our contours we used a simple connected spline of points. The minimization of energy for the contour is a tight fit around the initially selected logical object. In order to achieve this, we were able to localize the problem to each individual point. Conceptually, each point is affected by a number of forces that attract and repel it, leading to the desired effect.


    To form a tight fit of the object, we seek to minimize the area of the spline around the object. This can be done monotonically by moving points towards the interior of the spline. We calculate this direction for a point B, using the vector between its adjoined neighbor A to C (counterclockwise and clockwise from B respectively). Rotating this vector 90 deg. clockwise gives us the desired result. We denote this vector as the freedom of motion for its point at a particular frame. Motion in the direction of the vector is inwards and in the opposite direction outward.

    Following this method, however could also cause points to spread unevenly around the object. This could lead points to settle at local minima rather than minimizing the energy of the contour. The problem is solved by normalization of the distances between points.


    For image energy, consider the disparity map as a surface with different altitudes represented by blobs. The object we select is represented by one or a set of blobs around which we wish to maintain a contour. As changes in the image occur the points of our contour may end up placed on the objects we are tracking or other objects.

    Given a fast enough framerate, the points should only enter the tracked object or the new objects, not pass through them. Hence for each point, we can locate the closest point with a low disparity along the line of free motion and push the point towards it. When a point entered the object being tracked, this will push it outside. Otherwise the point is pushed away from other objects.

The total forces acting on each point decide where a point will travel and it is moved in that direction. The calculation is performed once for every point in the spline at each frame, iterations which are necessary to minimize energy occur through repetition on sequential frames (in order to maintain real-time tracking).


The Problem of Occlusion

One of the largest problems with tracking is occlusion, where the object we are tracking moves in front or behind another object. We had hoped to resolve this problem for our algorithm. To do so, we tried a number of methods. Once we discovered that our algorithm was fast enough to track a large number of objects simultaneously, we narrowed this problem to occlusion between tracked objects. (Since all large objects in view could be tracked).

Some of the initial ideas involved finding the mean of the movement of unocluded points and translating the occluded points in that direction. This proved ineffective, since the motion of an object and its points was not necessarily related. For example, a person may move left, but the points on top of his head actually fall to his shoulders rather than translate left (by moving along their line of freedom).

Our final decision was to use a Closest Point algorithm from a previous unoccluded contour to a current one. It was necessary to find the translation of the previous contour which would give the closest approximation of the new contour. To do this, for each unoccluded point in the translated previous contour, we find the distance to its closest point on the new contour. Summing these provides a measurement for the quality of the approximation. Minimizing this distance gives an optimal translation of the previous contour that matches the current one, and this is then applied to the occluded points.