Rosenfeld's paper about HAL explains a possible method for segmenting objects in a scene based on motion. The motion of objects is hierarchical, so image-differencing highlights moving objects, and allows us to see the relative motion of different objects (with respect to each other). This is a good approach for vision research to take (hihi!), but Rosenfeld the explains that faces can easily be monitored with this method...This is a huge oversimplification of the problem of facial-tracking, but does relate to a futuristic view of computer vision...