Segway and Kuka KR-sixx platformWe developed complete perception and control software, and put it into practice with extensive testing, for a dynamically-balancing Segway robot with a 6-axis arm and gripper. The robot located a coffee cup and coffee maker using vision, filled the cup with coffee, and return the coffee to the user. Particle filters for object detection, and layered PID controllers to compensate for platform motion, made our method robust. Our robot completed all 3 trials required by the "coffee challenge" with no failures.

"Team 1" was: Cressel Anderson, Philip Case, Niyant Krishnamurthi, and Richard Roberts.


[ Video mov ]
[ RSS '07 Workshop Poster pdf | Poster paper pdf ]


Team 1 utilized Player/Stage to interface with the platform hardware, a particle filter for robust visual pose estimation of the objects, and KUKA Remote Sensor Interface (RSI) to control the arm. These components communicated via TCP sockets, and were controlled by a Java front-end. The system used two Core 2 Duo laptops running Ubuntu Linux. The programmed task was initiated and monitored with an additional two laptops at a station outside of the testing area. Using this approach, the team was able to successfully complete three runs robustly in rapid succession during the final demo.

Object model of the cup.

A player server ran on one of the two laptops and provided the interface to the Segway and SICK laser. Localization and navigation components were also implemented. Because the approximate locations of the coffee mug, coffee machine, and coffee delivery location were known, the navigation controller was able to have the platform achieve the desired pose, and then transfer the control to the visual manipulation controller for grasping and manipulation. Both position and velocity commands were used to control the end effector position. The success of the team’s approach is largely due to the robust operation of the vision system. A model based Monte Carlo approach was used in conjunction with the constraint that the cup would be oriented vertically. First, models of the coffee cup and coffee maker were created by sampling an image of the object and transforming the image to yield values for each three dimensional point on the surface, as seen in Figure 2. In this case, the points were created using physical measurements of the objects dimensions. Next, using an appropriately tuned particle filter, the objects pose could be quickly and accurately be determined, as seen in Figure 3. Additionally, the arm configuration was utilized in the particle filter for updating the hypotheses to account for changes in the camera pose, thereby enabling better dynamic tracking of objects.

Estimating the pose of the cup.

The models used in this approach made use of the known appearance of the cup and coffee machine. Because color was used, varying lighting conditions and specular features could interfere with object detection. To counter the specular features on the metal surface of the coffee machine, a mask creation feature was added to the model building program. This addition removed the areas of brushed aluminum on the coffee maker face from the model.

Because the vision system made use of the camera calibration, it reported object pose in real-world units. This greatly sped up programming of end effector positions with respect to the tracked objects by allowing the team to simply measure distances in the real world, followed by minor tweaking of coordinates.

The dynamic stability of the Segway added complexity to the task of visual servoing. As the arm was extended the center of gravity of the platform would shift accordingly, and the platform would roll forward or backward to accommodate this shift. To deal with these large unmodeled movements, multiple closed-loop controls were employed, running simultaneously at several levels of body control. One controller moved the platform as its center of balance changed, to attempt to keep the arm coordinate frame stationary in the world. A second controller servoed the arm to be directly in front of the object to be grasped. At very close distances, the vision system’s estimate of the object pose was used to continuously servo the end effector to the target pose. This multi-layered controller helped make performance robust even in these dynamic manipulation tasks.