Talk #1:
Title:
Stereo Approaches to Handle Occlusions, Highlights, Reflections, and
Translucency
Abstract:
Early image-based rendering techniques require lots of images, which result in
rather fat representations. If accurate depth is available, equally high-quality
rendering can be accomplished with far fewer images (within limits).
Furthermore, good depth data of real scenes would allow us to manipulate objects
with plausible-looking results. This is where stereo comes in handy. The problem
is, stereo is very difficult to get
right: real scenes have occlusions, highlights, reflections, and translucency.
In this talk, I will describe how we progressively tackle the problems of
occlusion, highlights, reflections, and translucency. To handle occlusion, we
use a combination of shiftable windows and a dynamically selected subset of the
neighboring images to do the matches. To handle highlights, we apply a color
histogram differencing technique. Finally, to take into account reflections and
translucency, we model the image formation as additive superposition of two
layers at two different depths, and solve for them iteratively. I will show
results for both synthetic and real image sequences as validation of these
approaches.

Talk #2:
Title:
Image-Based Rendering of Dynamic Scenes
Abstract:
The ability to interactively control the viewpoint while watching a video is an
exciting application of image-based rendering. Our goal is high-quality
rendering of dynamic scenes with interactive viewpoint control using a
relatively small number of video cameras. In this talk, I will describe how we
achieved this goal using multiple synchronized video streams combined with novel
image-based modeling and rendering algorithms. Once these video streams have
been processed, we can synthesize any intermediate view between cameras at any
time, with the potential for space-time manipulation. In our approach, we first
use a color segmentation-based stereo algorithm to generate high-quality
photoconsistent correspondences across all camera views. Mattes for areas near
depth discontinuities are then automatically extracted to reduce artifacts
during view synthesis. Finally, a new temporal two-layer compressed
representation that handles matting is developed for rendering at interactive
rates. This work was done with Larry Zitnick, Matthew Uyttendaele, Simon Winder,
and Richard Szeliski, and was presented at SIGGRAPH'04.

Bio:
Sing Bing Kang (http://www.research.microsoft.com/~sbkang/) received his Ph.D.
in robotics from CMU in 1994. He is currently a researcher at Microsoft
Corporation working on environment modeling from images. His paper on the
Complex Extended Gaussian Image won the IEEE Computer Society Outstanding Paper
award at CVPR'91. His IEEE Transactions on Robotics and Automation article on
human-to-robot hand mapping was awarded the 1997 King-Sun Fu Memorial Best
Transaction Paper award.
Sing Bing has published about 25 refereed journal papers and about 45 refereed
conference papers, mostly on stereo and image-based rendering.
He also holds 14 US patents and has co-edited two books in computer vision.