Automated Capture, Integration, and Visualization of Multiple Media Streams

Automated Capture, Integration, and Visualization of Multiple Media Streams

Jason A. Brotherton, Janak R. Bhalodia, Gregory D. Abowd

GVU Center & College of Computing

Georgia Institute of Technology

Atlanta, GA 30332-0280 USA

+1 404 894 7512

{brothert, janak, abowd}@cc.gatech.edu

Abstract
In this paper, we discuss our efforts in building a system for the automatic generation, integration, and visualization of media streams— independent information sources that define rich interactive experiences. This research is done in the context of a project called Classroom 2000, an experiment in ubiquitous computing for education. We view teaching and learning as a form of multimedia authoring and have developed a system that generates much of the educational content that occurs naturally in university lectures. We have developed a taxonomy for characterizing different media streams that can be captured during or after a live session and we present solutions to two research issues that affect the access of a rich multimedia record. The first issue deals with the granularity of stream integration. The second issue deals with methods of visualizing a set of integrated media streams that is scalable and that supports a user's desire to search by browsing.

1. Introduction

One of the potential features of a ubiquitous computing environment is that it could be used to record our everyday experiences and make that record available for later use. Indeed, we spend much time listening to and recording, more or less accurately, the events that surround us, only to have that one important piece of information elude us when we most need it. We can view many of the interactive experiences of our lives as generators of rich multimedia content. A general challenge in ubiquitous computing is to provide automated tools to support the capture, integration and access of this multimedia record. The purpose of this automated support is to have computers do what they do best, record an event, in order to free humans to do what they do best, attend to, synthesize, and understand what is happening around them, all with full confidence that the specific details will be available for later perusal.

Though capture integration and access is a general problem in ubiquitous computing, real progress in this area can only be achieved through close examination of specific applications. This paper reports on our experience over the past two years providing capture, integration and access capabilities in the context of university teaching and learning. Specifically, in this paper, we report on our understanding of the variety of information streams that can be captured, issues concerning the time integration of independent streams, and techniques for providing robust interfaces to visualize and search across multiple media streams.

In the remainder of this paper we will use the application domain of university education to make concrete the ideas on multiple stream capture, integration and visualization. After providing a brief overview of related work, we will describe the Classroom 2000 project that has provided the basis for our research over the past 2 years. We will develop a taxonomy for the variety of information streams that can be captured in a rich interactive setting with ubiquitous computing technology. In presenting a record of an experience afterwards, it is often valuable to relate different streams (e.g., "What was being said when this information that I am viewing was displayed?"). We will discuss issues that arise in the automated integration of multiple streams to render such relationships. Finally, it is important to provide a way for a user to quickly browse the multimedia record of an experience, so we describe techniques developed to visualize multiple streams of information simultaneously and to facilitate random access between streams. We will conclude this paper with a list of future research objectives that apply specifically to the educational domain and, more generally, to other application domains within ubiquitous computing.

2. Background and Related Work

There has been a great deal of research related to the general Capture, Integration, and Access (CIA) theme, particularly for meeting room environments and personal note taking. Our work has been greatly influenced by previous research at Xerox PARC in ubiquitous computing [16, 17] and tools to support electronic capture and access of collaborative activities [7,9,10,11].

The Marquee note-taking prototype developed at PARC [15], the Filochat prototype developed at Hewlett-Packard Labs [18], and the Dynomite personal note-taking environment from FX-PAL [19] are examples of programs which support the capture, integration, and access of a meeting for an individual. We have experimented with personal notebooks, but our most significant experience has been support for public display surfaces in the classroom using a large-scale electronic whiteboards. In the classroom setting, we want to capture information provided by the teacher during a lecture, so electronic whiteboard capabilities such as those provided by Xerox LiveWorks LiveBoard [7], SMARTBoard [13], and SoftBoard [14] are essential to our research.

The integration of text with audio or video is critical to our work, and this is a popular research theme for those interested in information retrieval and content-based access to multimedia. There are a number of research and commercial systems to align textual transcripts to audio and video archives. This linking between text and audio greatly enhances search capabilities, but it is only useful when a textual transcript is available, a process that currently requires much post-production effort. When it is not practical to produce such a transcript, we rely on the implicit relationship between an audio stream and time-stamped pen strokes or keystrokes. This time-based relationship directly supports the task of a user browsing a set of notes that was taken during a class and asking "What was the lecturer talking about when this was being written?" This form of integration is exploited by the note-taking prototypes mentioned above, as well as other work at MIT's Media Lab [8] and at Apple [6].

The assumption in all of the preceding work in this area was that only one person will ever access the recorded notes and associated audio or video streams. Our work, in contrast, assumes that captured information in the classroom supports a number of people by creating shared public records. Other work in CIA that has focused on supporting groups of people and the presentation of multiple media streams includes Bellcore’s STREAMS prototype [5] and FLIPS [12]. STREAMS integrates audio and multiple video streams along with audio or textual annotations. Each camera and microphone data is saved as independent streams that are synchronized by time and are presented to the user all at once, allowing the user to explicitly focus attention to one stream. Our work is novel in that we incorporate more than just simple streams by using control streams to help with the presentation of the review and that we deal with the scalability problem of managing the presentation under a growing number of media streams.

3. Classroom 2000: an experiment with ubiquitous computing in education

In a traditional class, the instructor either writes on a blank chalk or whiteboard or uses an overhead projector to display prepared slides that can then be annotated. The problem with using whiteboards or annotating overhead slides is that the student is forced to copy everything the teacher writes into their own private notebook. Additionally, once the teacher's writing is erased, the information is forever lost. The advantage of electronic media used in the classroom is that there is the possibility to preserve the record of class activity.

We distinguish this electronic media from multimedia courseware. The production of multimedia courseware is a time-consuming task. It may take years of effort to produce a high-quality course module. One of the objectives of the Classroom 2000 project is to reduce the content-generation effort by viewing the traditional classroom lecture itself as a multimedia authoring activity [3]. In a classroom setting, there are many information streams present in addition to audio and video recordings of what we hear and see. The teacher typically writes on some public display, such as a whiteboard or overhead projector, to present the day's lesson. It is now common to present supplemental information in class via the World Wide Web, if sufficient network connectivity and display technology is available. Additional dynamic teaching aids such as videos, physical demonstrations or computer-based simulations can also be used. Taken in combination, all of these information streams provide for a very information-intensive experience that is easily beyond a student's capabilities to record accurately, especially if the student is trying to pay attention and understand the lesson at the same time.

Figure 1 (left): An electronic whiteboard (a LiveBoard) being used to deliver a lecture.
Figure 2 (right): A screenshot of our own whiteboard software, ZenPad, used to present lecture material.

Figures 1 and 2 show the use of an electronic whiteboard to support the delivery of a lecture. We have used a commercial electronic whiteboard (a Liveboard [7]) running our own software, a program called ZenPad [2]. The main advantage of the electronic whiteboard is the ability to provide a record of the lecture to students after class in the form of Web-accessible notes that are augmented by audio and/or video. As students begin to rely on the note-taking capabilities of Classroom 2000, they report a modification of their in-class behavior away from copious note taking to a more summary style of note taking and toward a more active participation in and understanding of the lecture itself. Our preliminary findings indicate that the construction of a classroom environment to relieve the burden of student note-taking can enhance student engagement in the learning experience and augment the teacher's capabilities. Automated and robust support for the capture, integration and access to classroom experiences is necessary for us to more rigorously test this hypothesis [2]. We are currently running experiments and collecting data to determine the impact our system has on education.

Over the past 22 months, we have experimented with providing easy to use electronic whiteboards and audio and video capturing tools. We have also instrumented Web browser support to capture much of what naturally occurs in a typical university lecture. We have provided support for nearly 30 separate graduate and undergraduate courses offered in Computer Science, Electrical Engineering, and Mathematics at Georgia Tech. We have constantly modified the components of the system as we have received feedback on its continual use by students and teachers. The insight we have gained through constant use over the past two years has been invaluable.

4. Capturing multiple media streams

We can categorize each of the media streams that occur in a classroom into one of three types: simple streams, control streams, and derived streams. Simple streams are traditionally all that is recorded in trying to preserve a lecture experience. We argue that by using different types of streams, namely control and derived streams, a better presentation of a captured experience can be achieved.

4.1. Simple Streams

Simple streams are ones that are captured for the sole purpose of being played back unmodified. Audio and video recordings of the class are two good examples of simple streams. If an instructor uses prepared slides during a lecture, the sequence of slides presented is another example of a simple stream. A time-stamped log of URLs visited by a Web browser is a simple stream that recreates the browsing activity that occurred during class. Other interesting examples are more dynamic media that are presented during a class, such as a video or a demonstration of a computer program.

4.2. Control Streams

Control streams are meta-streams containing information that can be used to index into other simple streams. For example, as a teacher writes on an electronic whiteboard, the pen-strokes can be individually time-stamped and used later to provide an index into the audio recording of the class. In this case, the teacher's pen-strokes, or annotations are a control stream that indexes into the audio. Similarly, if a student were given the ability to press a button whenever she felt an important topic was being discussed in class, the times at which this button was pressed could serve to index into the audio or slide presentation of the class. Unlike simple streams, control streams are mutable. For example, in reviewing a lecture, a student might decide that an index mark considered important during class is no longer needed and can be removed. Or perhaps it is not marking the correct place in the lecture and should be removed. Control streams can be generated during or after a class. The examples above discuss creation during a class, but it is equally possible that during review of a lecture, a student can decide to save an index into a lecture for future reference, similar to a bookmark.

Figure 3 (left): Slide level granularity. In this interface, audio links are provided at the slide level in the left browser frame. Thumbnail images of all class slides line the top frame enabling quick browsing. Selecting a particular slide reveals a magnified copy of the slide and provides audio links (in the left frame) that indicate all times during the lecture at which that slide was visited. Selecting an audio link pulls up a streaming audio client at that time.
Figure 4(right): Pen-stroke level granularity. Selecting a particular pen-stroke (indicated by the shaded bounding box) invokes a streaming audio player (in this example, Real Audio) at that point during the lecture.

4.3. Derived Streams

Derived streams represent information that is produced by analyzing some other stream after the live event. For example, in Classroom 2000 we use a Hidden Markov Model toolkit to transform an audio recording into a sequence of phonemes that can be used to provide searching capabilities. Since this transformation currently cannot be done in real-time, the phoneme stream is a derived stream. Similarly, we can analyze a video stream to determine when significant gestures occur, such as when and where the lecturer points to the electronic whiteboard. This gesture stream is another example of a derived stream. Derived streams can be used for unmodified playback (e.g., producing an animated cursor that reproduces the hand gestures of the lecturer over a slide presentation) or to index into another stream (e.g., using the phonemes to facilitate a search of the simple audio stream).

An important issue with multiple stream capture is the synchronization of the various streams that is necessary for coordinated playback. We want to encourage the proliferation of different captured streams in the class, so we must provide a scalable mechanism for multiple stream synchronization. Our approach to stream synchronization is to centralize control for launching streams. When ZenPad is launched, it automatically launches a configurable set of external recording applications running on other network-accessible computers. This solution works well for streams that begin at the same time, say the beginning of the lecture. Different synchronization schemes are necessary in other situations. For example, it is inappropriate to assume that the student note-taking devices begin at the same time, so we make sure that the control streams generated by student units provide absolute times generated from a local clock. Better systems level support for stream synchronization is an open issue for us.

Once streams are generated and captured, how are they presented for review afterwards? This presentation raises two separate issues, stream integration and multiple stream visualization, both of which are discussed below.

5. Stream Integration

One reason why videotaped lectures are considered ineffective is that they do not lend themselves to random access and simple browsing. One of the ways we can augment a simple videotape is by providing some way to randomly access significant parts. By using other captured streams (in particular, control streams), we can facilitate random access. In general, exploiting the relationship between different streams is a stream integration problem.

Once the streams have been captured we are faced with the problem of determining the relationship between different streams and how to integrate them into a presentation of the lecture. All streams have at least one inherent relationship: time. We can easily exploit this relationship in developing a presentation of multiple streams. When one stream is positioned at time t we can position all of the other streams at time t as well. We can use control streams to serve as indices into other streams. For example, when viewing a lecturer's notes written on the electronic whiteboard, each individual pen-stroke can serve as an index into the audio recording for the class.

Figure 5: Word level integration. A post-processing program run on the slides generated by ZenPad uses simple space and time thresholds to group pen-strokes together. The thresholds give a close approximation to word boundary detection. The boundaries are used to define HTML image maps to invoke audio associated to the word.

5.1. Granularity of integration

Since time is the common element between streams, we must ask the question, how tightly coupled must the streams be in order to provide a useful presentation? We will consider this question with respect to the control stream of pen-strokes produced by the teacher using ZenPad. Different choices for the granularity of integration provide for different access capabilities. There is a continuum for this granularity, where at one extreme, the coarsest granularity is at the level of each lecture. We provide a single entry point into a stream that is its beginning. This is similar to having a tape recorder without any memory settings; you can only play the tape from beginning to end and there is no way to directly access a particular segment of the recording. At the other extreme, since we have recorded all pen-strokes, we know exactly when every pixel was recorded. Selecting any part of a pen-stroke can take you to the exact time during the lecture when that pixel was drawn. In between these two extremes are some more useful levels of granularity that we have investigated. Examples of interfaces with differing granularities are shown in Figures 3-5.

5.1.1. Slide level. Our initial attempt at supporting an electronic whiteboard had slide-level integration. We provided a presentation of two media streams for the student, the slides and audio (Figure 3). Each slide had a control marker specifying when it was visited and for how long. If a student wanted to hear the audio associated with a slide, they could index into the audio based on each visit time for that particular slide.

One advantage of this method of integration is that the audio stream could drive the slides to create a "slide-show" presentation of the lecture with the slides synchronized with the audio. While we initially believed that this would be a reasonable level of granularity, we discovered that some instructors would show a slide and then talk about it for 20 minutes. This gave the student only a marginally better level of access than lecture-level integration.

5.1.2. Pen-stroke level. The next version of our electronic whiteboard supported integration down to the level of individual pen-strokes (Figure 4). When viewing the notes, a student could click on any pen-stroke and hear the audio during the time that stroke was written. While certainly better than slide at a time integration, this still was not sufficient if an instructor brought up a slide and discussed it, but did not write any ink on it. We also realized that this level of integration was too fine in practice.

5.1.3. Word level. When a student sees some part of the notes that are of interest, they may click on some written text to access the audio for clarification. In this case, it is not likely that the student wants to hear the audio at the exact time the particular pen-stroke was written. More likely, she will want to access the audio at a time approximating when the word containing the pen-stroke was written. To support this level of integration, it was necessary to create use some simple clustering algorithms to group individual pen-strokes into higher level units, approximating word boundaries. We used simple space and time thresholds between consecutive strokes to define these unit boundaries and adjusted the thresholds until word-level boundaries were detected. The results of this clustering algorithm are shown in Figure 5. We could have integrated this clustering into ZenPad, but instead chose to create an alternative HTML-only interface using GIF images and image maps. This heuristic works well for handwritten text, but does not work as well for drawings, as can be seen by numerous overlapping boxes in the pictures drawn on the slide in Figure 5.

5.2. Other issues with integration

It is clear from our experience that there is no one level of granularity that is appropriate at all times. Students have found it useful to have slide-level integration and word-level integration available simultaneously. For slides with a majority of text (such as the Calculus slide in Figure 2) it is probably more beneficial to have line-level integration instead of word-level. It might also make sense to let the student dictate the level of granularity on demand. We also noticed that for some teachers, a lot of time would be spent on a single slide without much writing activity. In this case, neither slide-level nor annotation-dependent integration is best. Rather, we found it useful to provide a timeline for the particular slide (shown in the next sections) that would allow for integration at regular time intervals (e.g., every 60 seconds). We currently do a poor job of providing smart integration for figures, but this is an issue we will address more in the future. Finally, we want to support tighter integration between streams so that playing one stream in real-time causes other streams to play as well. This form of playback integration is supported at a course level by some commercial technologies.

5.3. Integration lessons

The key idea presented in this section is that by using control streams such as our clustering algorithm we can achieve more a useful level of granularity than by simple streams alone. Another example of this would be a control stream that indicates when specific topics were discussed. In this case, clicking on a written word takes the student not to the video of when that word was actually written, but instead to the point in the video when the topic relating to that word was discussed.

Additionally, it is not sufficient to only support one level of granularity. Student note browsing is a complex activity and their interface needs to be able to dynamically scale the level of granularity.

Once we determine the correct level(s) of granularity, we still need to present the different media streams to the user. We discuss methods for solving this problem in the next section.

6. Stream visualization

Our initial goals with Classroom 2000 were modest. We wanted to display teacher notes with audio augmentation at some useful level of integration. With successful deployment of the system, we noticed the need to be able to support more streams of information generated during lecture, such as URLs visited by a Web browser, arbitrary video clips shown during class, execution of arbitrary computer programs, and a variety of student-generated notes. The integration techniques discussed above are effective for visualizing one stream of information, such as the notes, and using that visualization as an interface to another stream, such as the audio or video. While it was becoming clear how we could introduce more captured streams into the class, it was not at all clear how to effectively present all of those streams of information to the student in a way that supported their goal of review. We need more than inter-stream integration.

To help us understand some of the visualization issues, we began an effort to prototype interfaces that supported the numerous streams we envisioned capturing. The immediate problem was the tendency to overload the screen with far too many streams, making it difficult for the user to understand the relationships between them and even harder to know how to proceed to the point of the lecture that is of interest for review. As an example of this, consider Figure 6. Here we took the naïve assumption that each media stream should have its own window in the interface. The result is an overwhelmingly busy and complicated interface.

Figure 6: An example of a cluttered interface. The contents of each frame from top to bottom, left to right are: a time scale selector, a timeline of significant events (based on the time scale), a filter of events to view, a topic outline, an audio searching window, a video window, and Web pages visited during class.

Any visualization technique for multiple streams has two major constraints. It must scale to support many streams and it must provide the student with the ability to scan through an entire stream quickly in order to locate some point of interest. In this section, we discuss a number of techniques we are using to support multiple stream visualization that are scalable and support the "browsing to pinpoint" review behavior.

6.1. Browsing with timelines

All of the interfaces we have produced for viewing notes provide some way to scroll through all slides in one window. These slides are either provided as thumbnail images or full images. The notes are presented in the order in which they are traversed in class. This gets somewhat complicated when a particular note is visited on a number of occasions. For a large number of notes, it is also necessary to scroll quite far before seeing a slide of interest. We introduced the navigational technique of a timeline to help a user see at a glance how the lecture proceeded. Figure 7 shows a navigation timeline for a stream of slides. The timeline tries to indicate the relative amount of time spent viewing a slide during the course of a lecture and also indicates whether a slide is visited more than once during a lecture. Selecting a particular slide's bar on the timeline will move directly to that slide. We also provided an expanded timeline for the current slide. On this expanded timeline, we provide the student with some additional information about class activity while that slide was visible. Marks along this slide timeline indicate that the lecturer was writing on the slide at that time. All other times indicate that the slide was visible but the lecturer was talking about it and not writing on it. Integration with audio is provided by selecting some written text on slide or by selecting anywhere along the expanded timeline for the current slide.

Figure 7: An example of timeline navigation bars. At the very top is a timeline for the entire lecture indicating which slides were visited, at what times, and how long. Below that is an expanded timeline for the currently viewed slide. Vertical markings within this timeline indicate writing activity.

6.2. Determining focus of attention across streams

The timeline navigation technique for slides is relatively simple for users to understand and supports pinpoint browsing, but it does not directly scale to support more than one concurrently displayed screen such as an electronic whiteboard and a web browser. One approach to supporting multiple streams is to have a separate region of the screen for each stream. An initial attempt to display lecture notes, URLs visited during class and audio augmentation resulted in an interface with a column for whiteboard slides and a separate column for URL visits. The slide column displayed small images of whiteboard slides. Each image is linked to regular sized image with ink indices into the audio stream. The URL column contains the title of visited URL locations. They provide a link to the respective URLs. Each column represents a timeline for the respective stream with time increasing down the page. The location of a slide image or URL title within a column is determined by time of its visit during the lecture. The columns are synchronized so that a horizontal slice across the columns would provide information about respective streams at that time of the lecture. While effective for a small number of streams, this solution does not scale well. As more streams are captured, it would not be possible to display all streams concurrently. We prototyped example interfaces for the viewing of lecture slides, outline, speech searching, video, audio, and a timeline for displaying Web, slide, and student note accesses. These prototypes (not shown in this paper) presented an overwhelming interface with too much information for the user to discern. Furthermore, this overloading of information is misleading. While it is true that during class a number of streams might visible at once, it is likely that the focus of attention for the entire class is limited to a small number of those streams.

We, therefore, have built interfaces that try to predict what the focus of attention was in class and present a single timeline that is decorated with information indicating activity across multiple streams. Figure 8 shows how this focus of attention timeline looks. On the left side of the notes browser window is a timeline that extends from the beginning of class to the end. This timeline contains markers to indicate significant changes in focus of the class. When a new slide is created or visited during class, the assumption is that the slide is in the class focus, so the timeline indicates when this event occurs. Similarly, when a new URL is visited in the browser, the title for that Web pages is listed on the timeline at the point in class when it was loaded by the browser. We do some filtering of slide navigation and Web surfing. A slide or a Web page must remain visible to the class for some number of seconds before it is considered visited; otherwise, it is considered an ephemeral visit on the way to some other slide or Web page and is not indicated on the timeline. In addition, the timeline can be used to access the audio record of the class. It is still possible to invoke the audio from the annotations on the slides, but it is also possible to invoke the audio at a time near when a Web page was visited.

Figure 8: An example of a focus of attention timeline to visualize two independent streams. The timeline on the left is decorated to indicate significant changes of focus, from whiteboard slides to Web pages. The frame beside the timeline contains a scrollable list of whiteboard slides to facilitate browsing. Web pages are brought up in a separate browser window, as shown. Directly above the timeline is a link that allows students to print out the lecturer's notes easily.

All of the interfaces shown in this paper are automatically generated based on information streams captured (simple, control, and derived). This last interface is generated by a more general tool we have built to handle an arbitrary number of streams. This program is called StreamWeaver.

6.3. Audio search

One derived stream that we have worked on to date is used to support content-based search on the audio record of a lecture.

One revision to the interface of Figure 8 would be to add support for audio searching. The audio search interface provides the ability to search the audio stream for particular phrases (user specified). The results are returned in form of decorations on the timeline. Each word appears on the timeline at the time it was spoken. The interface also contains a list of keywords specified by the lecturer (also on the timeline). They can be used to perform an audio search. Researchers in the speech group at Georgia Tech have provided us with a Hidden Markov Model toolkit to transform audio files into a sequence of phonemes. The resulting phoneme stream is an example of a derived stream. The search is done by translating the keywords or user phrases into their phoneme equivalent and then matching against the phoneme stream. We will be instituting this audio search capability in one class during the latter half of the Spring Quarter ‘98.

6.4. Further issues with stream visualization

We have begun to explore allowing students to take some form of electronic notes during class. There are a variety of interfaces we are investigating and we expect to report on this work later. One of the issues that student notes raises with respect to stream visualization is providing a student the ability to distinguish between private and public information. All of the previous interfaces we have demonstrated contain information that is publicly available to all students in the class. A student will want control over who is allowed to view personal notes.

We have not yet experimented with ways to provide for "pinpoint browsing" of video, but we realize that this is an active research area within the multimedia community. We expect to use some existing content-based techniques to allow for rapid browsing of video and other more dynamic visual media streams. In addition, we expect that other content-based techniques will allow us to use vision research results to do with arbitrary gestures what we already do with audio searching.

7. Generalizing to other domains

We have explained our solutions to some of the issues surrounding capture, integration and visualization of multiple media streams in this paper. All of the examples in this paper focus on the specific application of the Classroom 2000 project. Our results are not limited to this educational domain. Other researchers have spent much time investigating support for business meeting capture [9,10,11]. We are currently investigating the user of capture technology to support a specific kind of meeting that is a technical design meeting, during which designers verbalize the nuances and rationale behind architectural decisions, frequently without documenting them elsewhere. This design capture is being done as part of the MORALE project at Georgia Tech, under the auspices of the DARPA EDCS program. Finally, we have done significant work on support for a tourist investigating an unfamiliar location, as part of the Cyberguide project. [1] Automatic generation of a travel diary was first investigated in this project and is another example of capturing multiple media streams (locations, still and moving pictures, comments on experiences, etc.) for later access.

8. Conclusions

In this paper, we have identified some of the critical issues in the capture, integration and visualization of rich interactive experiences that contain multiple independent streams of information. We defined a simple taxonomy of streams, from simple view-only streams, to control streams that index into simple streams to derived streams which are generated after the fact from simple or control streams. Providing useful access to multiple media streams is a difficult problem, especially as we scale up the number of available streams. Two specific problems we addressed in this paper concern stream integration and multiple stream visualization. For stream integration we described a continuum of integration strategies, with the advice that multiple strategies are favored over a single strategy and that derived streams work well to generate useful levels of granularity. For visualization, we emphasize solutions for the scalability problem and support for user browsing and focussed discovery by providing a timeline "road map" of the lecture that is decorated with significant events.

All of the results presented here were in the context of a specific project applying ubiquitous computing to education. It is important to note that though our results are fairly general, extending to the domain of meetings, technical design discussions and tourism, for example, we arrived at our solutions through deep involvement in the educational domain. We have immersed ourselves in an environment that provides many ubiquitous computing capabilities in the classroom and it is through living that experience for nearly two years that we have been able to uncover significant general lessons to apply to other multimedia research problems.

9. Acknowledgments

The authors would like to acknowledge the support of the National Science Foundation. Dr. Abowd is funded through a Faculty CAREER grant through the ITO and Interactive Systems programs within the CISE division of NSF. Dr. Abowd is also funded in part by DARPA through the EDCS program (project MORALE). The authors are members of the Future Computing Environments (FCE) Group at Georgia Tech and have received financial support from a number of industrial sponsors. The work in Classroom 2000 (www.cc.gatech.edu/fce/c2000) is specifically sponsored by Proxima Corporation, Xerox Liveworks, MERL, FX-PAL, Palm Computing, Apple and the Mobility Foundation. We thank these sponsors for their continued support, both financial and otherwise. Finally, the authors would like to thank the many students and faculty within the FCE Group for their strong support and energetic enthusiasm over the past two years.

10. References

[1] Gregory D. Abowd, Christopher G. Atkeson, Jason Hong, Sue Long, Rob Kooper and Mike Pinkerton. "Cyberguide: A Mobile Context-Aware Tour Guide", Baltzer/ACM Wireless Networks, Vol. 3. 1997.

[2] Gregory D. Abowd, Christopher G. Atkeson, Jason Brotherton, Tommy Enqvist, Paul Gulley and Johan Lemon. "Investigating the capture, integration and access problem of ubiquitous computing in an educational setting", Proceedings of CHI '98, Los Angeles, April 1998. To appear.

[3] Gregory D. Abowd, Christopher G. Atkeson, Cindy Hmelo Ami Feinstein, Rob Kooper, Sue Long, Nitin ``Nick'' Sawhney and Mikiya Tani. "Teaching and Learning as Multimedia Authoring: The Classroom 2000 Project". Proceedings of Multimedia '96, November, 1996.

[4] Gregory D. Abowd. "Ubiquitous Computing: Research Themes and Open Issues from an Applications Perspective", Georgia Tech Graphics, Visualization and Usability Center Technical Report, GIT-GVU-96-24, September 1996.

[5] Cruz, G. and R. Hill. "Capturing and Playing Multimedia Events with STREAMS", Proceedings of ACM Multimedia ‘94, October 1994, pp. 193-200.

[6] Degen, L., R. Mander and G. Salomon. Working with audio: Integrating personal tape recorders and desktop computers. Proceedings of CHI'92, May 1992, pp. 413-418.

[7] Elrod, S. et al. "Liveboard: A large interactive display supporting group meetings, presentations and remote collaboration", Proceedings of CHI '92, May 1992, pp. 599-607.

[8] Hindus, D and C. Schmandt. "Ubiquitous audio: Capturing spontaneous collaboration", Proceedings of CSCW '92, November 1992, pp. 210-217.

[9] Minneman, S. et al. "A confederation of tools for capturing and accessing collaborative activity", Proceedings of Multimedia '95, November 1995, pp. 523-534.

[10] Moran, T. et al. "Evolutionary engagement in an ongoing collaborative work process: A case study", Proceedings of CWCW '96, November 1996.

[11] Moran, T. et al. "I'll get that off the audio: A case study of salvaging multimedia meeting records", Proceedings of CHI '97, March 1997, pp. 202-9.

[12] Schnepf, J. et al. "Doing FLIPS: Flexible Interactive Presentation Synchronization", Proceedings of International Conference on Multimedia Computing and Systems, May, 1995, pp. 213-222.

[13] SMART Technologies Inc., SMARTBoard. SMART Technologies Inc., #600, 1177 - 11th Avenue SW Calgary, AB Canada T2R 1K9.

[14] SoftBoard, SoftBoard. SoftBoard, 7216 SW Durhan Road, Portland, OR 97224.

[15] Weber, K. and A. Poon. "A tool for real-time video logging", Proceedings of CHI '94, April 1994, pp. 58-64.

[16] Weiser, M. "The computer of the 21st century", Scientific American, 265(3):66-75, September 1991.

[17] Weiser, M. "Some computer science issues in ubiquitous computing", Communications of the ACM, 36(7):75-84, July 1993.

[18] Whittaker, S., P. Hyland and M. Wiley. "Filochat: Handwritten notes provided access to recorded conversations", Proceedings of CHI '94, April 1994, pp. 271-7.

[19] Wilcox, L., B. Schilit and N. Sawhney. "Dynomite: A dynamically organized ink and audio notebook", Proceedings of CHI '97, March, 1997, pp. 186-193.