Synergistic Activity and Object Recognition

In this project, we propose to recognize daily activities and object in a synergy, using both RFID and vision sensors.

One major difficult in recognizing both activities and objects is the heavy burden required to manually label activities (in each frame) and object (boundaries and categories).

In many “activities of daily living”, the activity can be deduced from the objects involved in it, e.g. “cup + sugar + instant coffee bag + spoon” may well indicate an “make coffee” activity. In this case, the RFID technology can be used as a way to provide very noisy labels for certain objects. As shown in the image, RFID tags are attached to objects, and the RFID receiver bracelets on human hands are activated whenever the hand is close to an object with a tag. However, the RFID signal is very noisy: multiple activation can occur at the same time, activation with wrong ID can be transmitted, activations can be missed, etc.

We propose to use vision as a complementary sensor for noise removal. A video camera is setup to overlook the activities. The vision modality can effectively disambiguate RFID signals, e.g. two video frames with the same RFID objects activated should contain similar visual features because they correspond to the same object. On the other hand, the RFID signals provides image labels that can be useful for obtaining object visual models.

We setup a Bayesian network to capture the relationship among Activities (the “A” nodes), objects (the “O” nodes), RFID readings (the “R” nodes), and video frames (the “V” nodes).

The activity knowledge (e.g. what objects are involved in “make coffee”?) are encoded in this Bayesian network as conditional densities. We then plug in the observations (dense video frames and sparse RFID activations) and run the junction tree algorithm to jointly infer activity and object labels for each frame.

We observed that the inclusion of videos helped in recognizing both activities and objects.

Please refer to the following paper for more details:

Jianxin Wu, Adebola Osuntogun, Tanzeem Choudhury, Matthai Philipose, James M. Rehg. A Scalable Approach to Activity Recognition based on Object Use. ICCV 2007.


[Back to homepage]