Techniques for Addressing Fundamental Privacy and Disruption Tradeoffs in Awareness Support Systems

Scott E. Hudson and Ian Smith
Graphics, Visualization, and Usability Center, and College of Computing
Georgia Institute of Technology, Atlanta, Georgia, 30332-0280
{hudson, iansmith}@cc.gatech.edu

Abstract

This paper describes a fundamental dual tradeoff that occurs in systems supporting awareness for distributed work groups, and presents several specific new techniques which illustrate good compromise points within this tradeoff space. This dual tradeoff is between privacy and awareness, and between awareness and disturbance. Simply stated, the more information about oneself that leaves your work area, the more potential for awareness of you exists for your colleagues. Unfortunately, this also represents the greatest potential for intrusion on your privacy. Similarly, the more information that is received about the activities of colleagues, the more potential awareness we have of them. However, at the same time, the more information we receive, the greater the chance that the information will become a disturbance to our normal work.

This dual tradeoff seems to be a fundamental one. However, by carefully examining awareness problems in the light of this tradeoff, it is possible to devise techniques which transmit just the right type and quantity of information, so that awareness can be achieved without invading the privacy of the sender, nor creating a disturbance (or consuming too many resources) for the receiver. This paper presents four such techniques, each based on careful selection of the information transmitted.
Keywords: Distributed Work Groups, Awareness Support, Privacy, Audio, Video, Visualization, Media Spaces.

Background And Motivation

Recent technological advances have made the transmission of audio, video, and other media across digital networks quite economical. For example, one can now buy inexpensive systems for personal computers which can communicate with audio, video, and shared objects across even relatively slow networks. This has made it possible to envision the widespread use of this technology to support distributed work groups.

While rich communications media, such as live video, can allow distributed work groups to operate more smoothly, they are still typically not nearly as natural as working co-located [12].

There are several reasons for this (see for example [2,3,6,11,13,19] for more detailed discussions). One reason is that much of co-located interaction tends to be implicit, informal, and serendipitous. Many interactions occur apparently by chance, and certainly with little effort. For example important interactions may occur simply on the basis of people "bumping into each other" in the hall, or because interested participants overhear the conversation of colleagues and join in with additional details or knowledge.

To support informal serendipitous interactions, it is important to operate in a continuous fashion (typically between whole groups), rather than strictly on the basis of explicit connections between individuals. In general, one needs to support interactions in modes more like sharing a space than like making a call on a telephone. Although older systems using analog technology (see for example [1,2,6]) were often connection-oriented because of the limitations of the technology, the goals behind many media space systems have been to use audio, video, and other media to create these kinds of virtual spaces which afford the opportunity for serendipitous interactions.

In addition, to serendipity, co-located interactions also operate within the context of a high degree of awareness of one's colleagues. Awareness comes in may forms and degrees. At the simplest level, we are merely reminded of the existence of our colleagues on a regular basis. We also are aware of the location, activities, and actions of our co-workers. We for example, might know, or be able to easily find out, whether a person is in their office, currently busy, in the middle of a rush project, or simply in a bad mood. These forms of immediate awareness help serve as a catalyst for communications, and are used in various social protocols that drive our interactions.

Over time we also come to know of our co-workers typical schedules, habits, skills and interests, and even their personalities. All these aspects of awareness contribute to our "knowing" our colleagues, and this awareness forms a crucial background for our interactions. Without such a shared background, interactions tend to be more distant, formal, and less fluid - specifically, more like interacting with strangers, and less like interacting with team mates.

The cues that drive our awareness come in various forms. Many pieces of awareness information are visual (e.g., seeing our co-workers presence, their expressions, their actions, etc.). However, in a shared space we also use auditory cues (e.g., overhearing conversations in the hall), and even spatial or environmental cues (e.g., noting that an office door is open or closed, or even the presence of a car in a parking lot).

Because of its importance as a backdrop and catalyst for communications, one important goal of most media space systems has been to support awareness in various forms by using various media - most notably video and audio transmissions.

The Dual Tradeoff

Systems which attempt to support awareness in distributed work groups immediately face several important challenges. First among these is the widely recognized issue of privacy. In fact, we believe there is a fundamental tradeoff between providing awareness information and preserving privacy. In general, the more information transmitted about one's actions, the more potential for awareness exists among those receiving the information. At the same time, however, the more information transmitted, the more potential for violation of one's privacy exists. There is also a dual to this tradeoff: the more information one receives about others, the greater awareness of them is possible. However, at the same time, the more information one receives, the more likely it is to disrupt normal activities or consume too many resources.

Characterizing and understanding these tradeoffs is central to the work presented in this paper. These issues are not entirely new (they are discussed in a somewhat different form in, for example [9,10]). However, using these tradeoffs as a lens for viewing awareness system problems can point the way to new techniques which both meet awareness goals, and preserve important privacy and non-disruption properties. Four of these techniques are discussed here.

Privacy

Privacy has been widely recognized as an important issue for media spaces. In a shared physical space we have a well established set of social protocols for dealing with issues of privacy. For example, the distinction between a public and a private space is normally immediately clear, and most adults know how to adjust their behavior for each with little effort. However, in a virtual space, it is often the case that the normal cues of public versus private spaces are absent. For example, when one walks into a small private office containing a video camera, all the physical space social cues may indicate a private or semi-private space, despite the fact that the office might also be contained in a large public media space. Because of the confusion and uncertainty that this entails, people are often (at least initially) uncomfortable with the idea of working in front of a video camera. This is understandable, since it presents the same situation as working in front of a one-way mirror. One never knows when someone might be watching, or in general, who might be watching. This effect is amplified by the technology since, on typical networks, this information can normally be received (or intercepted) by any user of the network who has the proper software (see [16] for a cryptographic approach to overcoming this problem).

Particularly challenging privacy issues arise if we attempt to support awareness for work at home. Consequently, to test our approach, the first of our new techniques attempts to address this difficult domain.

The home is often thought of as a protected and private space and part of the advantage of working at home is being able to operate in that more relaxed, and informal setting. For example, the first author frequently works at home at odd hours, and has been known to get out of bed to write down a thought, or fix a bug. In addition, home work spaces are often shared by family members who are not part of the work group, and who have important expectations of privacy in their home. In both these cases, turning an otherwise private physical space into part of a very public virtual space (e.g., with a live video feed) is really not acceptable. On the other hand, working at home can easily cut one off from the rest of a (distributed or co-located) work group if no awareness support is provided.

This situation presents a primary example of our fundamental tradeoff. At first glance providing awareness comparable to a live video feed without changing the private nature of the home would seem to be very difficult, if not impossible. However, viewing the problem in the light of this tradeoff can lead to interesting new solutions. In particular, in order to overcome what seems like a fundamental limitation, it is necessary to carefully examine what information is, can be, or should be transmitted in terms both of its awareness support content, and in terms of its effect on privacy.

Figure 1 The Privacy Preserving Shadow-View Technique Applied to a Home Media Space.

(a) Working at a Workstation.

(b) Entering the Room and Sitting Down

Figure 1 contains screen dumps of displays a user would see if they were using our first technique (described in detail below). This technique is very carefully crafted to provide just the right information so that some awareness can be provided, while retaining the basic privacy of the space. In particular, it shows information about the location and movement of people in various parts of a room (hence indirectly about activities) without actually transmitting any live images. As described below, the technique works by modifying a static image of the scene (previously captured when the room was empty), then darkening small squares within that image to indicate recent movements (as detected by frame-to-frame differencing from a live video image). In Figure 1a, we can see that the user is working on the machine which faces away from the camera, and in Figure 1b, we can see that the user has entered the room and sat down in the chair in the center of the work area.

Another widely used approach to privacy problems is to enforce reciprocity [5,6], that is to ensure that whenever someone can see or hear you, you can also see or hear them. This is normally a property of physical spaces and can allow many conventional social protocols to apply.

However, reciprocity has several drawbacks that limit its use. First, reciprocity forces all spaces to be public in nature. This clearly would not work for our home media space example. In addition, even if the highly public nature of reciprocity is acceptable, enforcing reciprocity really works smoothly only in connection-oriented systems. In continuously operating systems, everyone is normally "connected" to everyone else sharing the same space, and so, although reciprocity may be technically enforced, it is much less useful. However, even in connection-oriented systems, reciprocity can produce additional undesirable effects because it can easily cause disruptions. For example, the equivalent of looking around a large room, or walking down a hall [15], might cause changes to the user interfaces appearing on a whole series of workstations. While it is possible to try to reduce the attention demanding effects of these changes (see for example [18]), even small interruptions can change the social effects of an action (for example, from the analog of quietly walking down a hall, to the analog of running down the hall talking loudly). As a result, while reciprocity is a worthy goal, and can be effective in some situations, it is not always appropriate and additional measures to address privacy issues are typically needed.

Disruption and High Resource Utilization

In the area of disruption, continuously operating systems provide a unique challenge. Here, because everyone sharing a space is always "connected" to everyone else, resource demands can be high, and the opportunities for unwanted interruptions of "normal work" can go up dramatically. For systems of this sort to work well, it is important to place at least partial control of overt interruptions in the hands of the receiver of information [9]. In addition, because we would like these systems to scale to moderate or large work groups, it is also important that, in general, they do not consume too many resources from the receiver. These recourses include both cognitive (e.g., attention) resources, and machine resources (e.g., screen space and CPU cycles).

These dual tradeoffs between sending awareness information and privacy, and between receiving awareness information and disruption or resource consumption, seem to be fundamental at some level. However, like any tradeoff, different points in the tradeoff scale can have different properties, and there may be techniques which make very good compromises with regard to these tradeoffs. Further, by explicitly examining problems with regard to these tradeoffs, it may be possible to devise new techniques which transmit just the right type and quantity of information so that they have more desirable overall properties than existing techniques.

In the remainder of this paper we consider four such techniques. Each of these techniques is designed to explore some part of the tradeoff spectrum and to produce a design solution that provides awareness information while still preserving privacy or reducing resource utilization for the receiver. The first of these techniques, the "shadow-view" technique, is designed to explore issues of privacy.

The Shadow-View Technique

As described above, the problem of providing awareness from the home - particularly something comparable to a live video feed - presents considerable challenges with respect to privacy. However, by considering the problem in the light of our tradeoff dimensions it has been possible to construct a technique which sends just the right information so that we can provide a significant amount of awareness information, but not make a large impact on privacy. We call this technique shadow-views.

Figures 1 and 2 show screen dumps of the display a user would see if they were using our shadow-views system. Here, a static reference image is used to provide a spatial context for interpreting a visualization of movement data. For example in Figure 1 we can see work at a particular workstation, and a person entering the room, while in Figure 2, we can see that work is being done at two particular chairs in an office environment. This reference image is a single still shot taken from the video camera when the work area was empty. This image is then broken into a grid of 8x8 pixel regions. These regions are dynamically made lighter and darker based on movement data derived from live video input from the same camera (in the same location, pointed in the same direction).

Figure 2 Views of Activity by Two Different Workers in an Office Setting.

An area of the base image is made darker - to about one quarter of its original brightness - when that same region of the live image has "activity" in it (as measured by frame to frame differences above a small threshold). Thus, the static image is darkened in areas where people are currently active. Keep in mind, that the image presented to the user is still based on the original static image - no pixels from the current video feed are displayed. Over time, inactive regions are gradually lightened back to their original intensity. In particular, periodically - presently about every 20 seconds - a pass is made over the displayed image and all 8x8 regions currently not at their normal brightness are brightened by some amount (currently 25%). This allows activity in the scene to persist for some period of time leaving a "ghost image" of a person's movements within the space.

The system that supports the shadow-view display above consists of two parts: the ShadowServer and a client applet. The ShadowServer runs on the machine transmitting information, and computes difference areas which are sent to one or more applets which display the information for receiving users. In general, the ShadowServer sends only the coordinates of the regions with changes to its applet clients. The client applets are responsible for doing the display of the static image and darkening and lightening the regions accordingly. This property is important since it insures against surreptitious capture of the video (since it never leaves the local machine) and because it dramatically reduces bandwidth requirements.

The ShadowServer

The ShadowServer is written in Java, with some native methods (foreign functions) written in C for interfacing to the frame digitizing hardware. The interface to the digitizing hardware is a modification of the portable NV video system [7] to work with Java. Our current implementation of the ShadowServer samples (digitizes) a new image about once every 10 seconds. This delay is to minimize the load on the machine doing the digitizing and this is a specific effect of the (somewhat slow) digitization hardware in use at our site. However, because slow update of the final client image seems acceptable (or even preferable, in order to keep resource utilization down), more frequent digitization may not be necessary.

After capturing a video frame (in greyscale), the ShadowServer compares each grid square with the previous frame. We currently use a very simple algorithm for making the determination of whether or not there is activity in a given 8x8 grid square of the image. Each pixel in an 8x8 region of the current image is compared to the corresponding pixel in the previous image. If the difference in the values is greater than a threshold (currently about 8% of the dynamic range, or 20 out of 256 greyscale units) a counter is incremented. If at any point in the region the counter reaches a threshold value (currently 25% of samples), the region is considered active and client is informed of this region's activity. We have also experimented with allowing a "short circuit" of the above algorithm in which if any two pixels of the compared images differ by a large amount the region is considered to be active. This is useful if the threshold difference in the normal part of the algorithm is set to a large value to avoid noise in the digitization process causing false positives in the results.

Applets

Each user who wishes to receive awareness information, runs a visualization applet inside a World Wide Web browser which supports Java applets (such as Netscape 2.0). The display can be surrounded by a Web page which has explanatory text, links to contact information, other views, etc. The applet gets the static reference image to use as the base via the normal HTTP protocol. It then creates a network connection to the ShadowServer to request and receive change information. The applet only receives the coordinates of the region to be updated, so it is responsible for actually modifying the image. The process of periodically updating the image and lightening regions which are not at their normal intensity is accomplished via a background thread.

After some experience with the system, a couple of anecdotal results may be interesting. First, a user sitting at a workstation - even if engaged in solitary computer work - almost always moves enough to cause the algorithm to be triggered and the resulting display to have a dark patch in the area where the user is working. Given the context provided by the static image this is generally enough to determine, for example, if the user is working at the computer, or engaged in some other task. Second, sticking out one's arm (or similar gesture) in the region covered by the camera will generate a dark, vaguely arm-shaped region in the resulting image on the client workstation. This may indicate that our 8x8 pixel regions are too small (at least for some camera distances); larger regions would give out a less defined image in such a circumstance. Finally, we have observed that the ShadowServer can be fairly computationally expensive. In general, it will be forced to process the data corresponding to every pixel of a 320x240 image several times (at least once during capture and twice for comparisons). We are currently exploring difference calculations that look at a subset of the pixels (e.g. every other pixel or 1/4 total) in order to reduce this load.

A Shared Audio Technique

In addition, to the shadow-view technique, an audio technique with both privacy preserving and low-disturbance properties has also been developed on the basis of the dual tradeoff principles outlined above. (This technique is fully described in [17] and we will only give an outline of it here.)

For awareness purposes, it would be useful to maintain a shared audio space where co-workers could hear each other. However, such an "open-microphone" situation would clearly be unacceptable in most situations. While it is in reality rather difficult to do anything terribly embarrassing in front of a live video feed from an office (at least with low frame rates, and small images), we constantly say things that are intended only for a limited set of "ears". Further, constant conversation between members of a large group can be disturbing for those currently engaged in solitary work. None the less, eliminating all but explicit audio contact between distributed workers also eliminates opportunities for awareness and serendipitous interactions.

To provide some awareness information, while overcoming these difficulties, a new audio technique was developed which is designed to again transmit just the right type of information. This technique processes a speech signal into a non-speech audio signal that has several critical properties. First, all intelligible words are removed. This removes privacy concerns, and also significantly reduces the attention demanding properties of the sound. Second, the attention demanding properties of the signal are further reduced by techniques such as muffling, and volume reduction. Despite these transformations of the signal, enough information - in particular, both typical frequency distribution of the speaker and cadence information - are preserved to allow speaker identification. The result is a sound which allows one to determine who is speaking, but not what they are saying, and which is not demanding of attention and hence can fall into background noise.

Briefly, this technique works by taking a fixed sample of speech from the participant. Gaps of silence are removed from this signal and then it is repeatedly mixed with itself at random offsets. This creates a sound analogous to crowd noise, but from a crowd of one. This signal is further muffled, a small amount of white noise is added, and its volume reduced in order to reduce its attention demanding properties. Finally this signal is normalized to create a characteristic signal for the participant. This signal retains the typical overall frequency distribution of the participant, but contains no words. This characteristic signal essentially serves as an audio icon [8] for the person.

The overall technique works by providing a modified open microphone. The signal from the microphone is used to produce a coarse resolution envelope representing the volume of current sound. The receiver of the signal hears, not the actual live audio, but instead the characteristic signal of the sender, modulated by the volume envelope of the live signal. This provides live cadence information. When combined with the frequency distribution information from the characteristic signal, this is typically sufficient for speaker identification.

This technique, like the shadow-view technique, was designed specifically around an analysis of information with respect to the dual awareness tradeoff. In this case, the specific information isolated for transmission is speaker identity. By devising a technique which transmits only that information (while being carefully designed not to demand attention), it is again possible to provide awareness information while not violating privacy, nor causing undue disruption.

Figure 3. The Synthetic Group Photo Applied to an Artificially Inflated Group

The Synthetic Group-Photo

For our third technique - the synthetic group photo - we consider aspects of disturbance and resource utilization. Live video or periodically updated still images [4] are very useful for providing awareness of co-worker's presence and more generally their comings and goings. Our own experience with a local media space system has provided anecdotal evidence of the benefit of simply being able to determine when someone is in their work area in order to coordinate more explicit communication such as a phone call. However, even half size (320x240) video images will quickly fill the screen if there is one for each member of even a relatively small work group of, say, 10 people (not to mention the CPU utilization typically necessary for maintaining many simultaneous images). Moderate sized work groups of 30 or more, clearly cannot make use of these techniques.

The synthetic group photo technique, focuses on information about the presence or absence of colleagues (both as individuals, and aggregated as a group) and is designed to overcome this problem by providing a very compact, but still visually rich, visualization of this specific information. Because it is compact and driven from very low bandwidth information, it is suitable for continuous "background" use. Further the display itself can be used as a simple framework for invoking tools for explicit communication, or more detailed awareness tools.

This simple technique leverages off of the fact that people have a high degree of skill in recognition of faces. We can recognize people we know at great distances, or in our case, on the basis of small images. Because of this, people in group photos are typically easily identifiable, even through the photos often involve pressing many people into a small space, using multiple rows with significant overlap, etc.

The technique described here creates a synthetically constructed group photograph by packing together static "head and shoulder" images from participants into fairly tight, and in fact overlapping, configurations analogous to a group photo. In addition to packing images together, this technique also uses a simulation of depth which displays smaller images for people "in the back" and larger images for those "in the front" (similar to what would be seen looking into the audience of a theater). This allows differential use of the scarce resource of space. For example, more resources can be devoted to close collaborators by "seating" them in the front rows. Infrequent collaborators can be "seated" in the middle rows, and other members of an organization can be "seated" in the back row to help provide a gestalt awareness of the overall group. Figure 3 shows the layout of such a group photo for an artificially constructed group (since our actual work group is not this large). This image shows over 100 participants in a relatively small space.

Once a group photo has been constructed, we can use an estimation of the presence or absence of each worker to drive the dynamic inclusion or elision of their image. Presence estimation information can come from a number of possible sources including the video change detection algorithms of the shadow-view technique, mouse and keyboard activity, or even instrumentation of the work environment with technology such as motion sensors or active badges [14].

Layout Algorithm

Although it would be possible to construct group photo layouts "by hand" using an image editing program (and in fact we did several versions of this in preparing to build our display), this is a rather tedious task and would be difficult to keep up to date with frequent personnel changes. This is particularly true since it involves not only constructing a new layout, but also measurement of where each image is placed. Further, it is desirable to allow custom layouts for each user so that one's friends and closest collaborators can be "seated" first. Consequently, an automatic layout algorithm for constructing synthetic group photos has been developed. Although the layouts produced by this algorithm are not quite as good as a manual layout, they are generally comparable in density to the layouts we produced by hand, and have the general appearance we were seeking.

Based on our experiments with manual group-photo layouts we were able to conclude that just the ability to see most of a person's face was sufficient for recognition. Specifically, overlapping of the shoulders and even parts of the heads of people in these simulated photos allowed the images to still be quite recognizable, while achieving fairly tight packing. In addition, we found that the theater metaphor of (approximate) rows working from large to small images works well, and allows packing of substantial number of participants into a relatively small space.

The first step of the algorithm is preprocessing the images to get them into a canonical form. In normal use each participant might provide an image of themselves already in this form. For our initial prototype we prepared several photos from those available on existing web pages within our center. These were canonicalized to a size (preserving aspect ratios) which made the bounding box of the person's head between roughly 90 to 110 pixels high and about 50 pixels across (exact sizing is not critical to the algorithm). After canonicalizing the images, the bounding box of the person's head was recorded. Next the average vertical position of their eyes was measured so eye lines of the images could be lined up in the layout algorithm. Finally, a background removal was performed leaving only the head and shoulder images, along with a mask for indicating foreground versus background pixels.

The actual layout of images is performed in a priority order that can be established by the user of the system. Pictures ranked as most important are positioned first (but drawn last) using images that are 100% of the original canonical size. Lower priority images will fall into later rows and be of smaller size (down to 20% of the original).

The first row of images is portrayed at 100% of their normal size and placed in a fixed pattern. Currently these images are placed so that there is a gap equal to 90% of the average head width between the bounding box of each head image. Once the first row has been placed, successive images are placed in available gaps. Images are placed in groups designed to approximate rows, with each group successively reduced by an additional 20% from the original size.

The overall algorithm does placement only in terms of the bounding box for the head portion of the image. Shoulder images are allowed to fall wherever they may, and often overlap other shoulders (but only occasionally other heads).

Figure 4. Layout Profile Data Structure

As illustrated in Figure 4, the algorithm maintains a data structure which represents the top profile of the head boxes of the existing layout. To place a new image, the spans of the data structure are searched looking for the deepest gap wide enough to hold the head box of the image and at least deep enough that the chin position of this new image is at or below the lowest top of a head from a previous row. To produce approximate row alignment, images are positioned vertically within this gap such that their eye lines line up with the minimum top of the head from the previous row (but never overlapping the current head box with head boxes directly below it). If no suitable gaps are found, the row is considered completed and the next row, at the next smallest size, is started. If rows reach a minimum size (currently 20%) and more images remain, remaining rows are all done at the minimum size.

The layout algorithm and display software is implemented in approximately 1000 lines of Java code. The resulting displays can hence be embedded in web pages to provide additional information and functionality.

Because the layout algorithm works only with rectangles, it can also be used to compose objects other than head and shoulder images. In particular, although it breaks the metaphor to some extent, it might be desirable to provide one or two quarter size live video displays for the highest priority participants. Space for these can be integrated into the display using the same algorithm.

When Did Keith Leave

Our final technique also addresses resource consumption and focuses on presence information, although in this case providing more specific detail about an individual. (This technique is not, however, privacy preserving and hence must be used with care, and may not be suitable for all circumstances.)

Figure 5. A View of Recent History Around the Author's Desk

A live video feed or periodically updated still image is very useful for being aware of the comings and goings of colleagues. However, when this type of awareness information is available in conventional form, we only get a view of what is currently happening, and unless we keep the information displayed on the screen and pay attention to it, we do not develop awareness of larger patterns that indicate current levels of busyness, typical schedules, etc. Further, in a live feed, one will often see the empty chair of a co-worker you need to interact with. The fact that they are not currently in their work area is valuable information. However, this typically leads directly to additional questions such as "when did they leave?", "when they left, did they take their coat?", and "did they take their briefcase or backpack?". To be able to answer these questions and to provide a more general idea of recent patterns of activity without requiring the constant attention of the receiving user, we have developed a technique for visualizing a recent history of activities.

One form of recent history could be provided by recording a live video feed and playing it back with a VCR-like interface. However, the actual video images from a typical office environment are very boring, and searching through them is probably not a good use of one's time. Further the resources necessary to store useful amounts of video can be prohibitive (although we are currently considering time-lapse techniques which may be more useful).

The technique presented here (which we call locally "When did Keith leave?") attempts to take a more targeted approach. It collects selected still images which are designed to express the flow of activity in an area. An example view created by this technique is shown in Figure 5. Here we see the activity around the one of the authors' desks overnight.

The five selected frames show the work area at different points in time. We can see the following activity: At 1:39am the area is empty, then a few seconds later a co-worker stops by (to change the CD currently playing on the shared stereo). The area then remains empty (and although difficult to see here, it is somewhat dim because some of the lights have been turned off) until the author arrives at about 9:53 in the morning, turns on the overhead lights, and enters his work area. Notice that the long period of inactivity from 1:39am to 9:51am is not shown and does not consume either user or system resources.

The line graph at the bottom of the visualization shows total measured change (see below) over a shorter period of about two hours, and provides lines indicating where each captured image lies in this time line. By this we can see that the last three images are very recent, while the first two are much older - somewhere past the left end. Since the sequence of saved images can be spaced very non-uniformly in time, this provides a way to quickly see when the images were taken, and when there are long periods of inactivity (as happens in most of the left-hand portion of this particular visualization). Also, if there is a large amount of recent activity, and this dominates the display, the line graph can provide an indication of how long this activity has gone on.

The technique works by capturing a video frame periodically (currently approximately twice per minute, although this varies considerably depending on system load). Selection of stills is driven off of a frame-to-frame difference change indicator similar to the one used for the shadow-view technique. Here a global metric over the entire image is used. Each pixel is considered to have changed if it is more than 10% of the dynamic range (25 out of 256 greyscale values) different from the previous image. The total percentage of changed pixels is used as our metric.

A very simple (but effective) frame selection algorithm is used. The most recent frame is always displayed in the fifth position. When a new frame is captured, the algorithm decides either to shift the images to the left by one frame, retaining the previous frame, or to discard the previous image. The visualization is shifted if either the display is not yet full, or if more than about 20% of the new image (16000 out of 76800 pixels) has changed. Although a more sophisticated frame selection algorithm could be used, we have found this simple one very effective in practice. It tends to capture both the beginning and end of periods of high activity (or a single frame for activity occurring within only one frame).

This technique is implemented in Python, C, shell scripts, and HTML. The main driver (in Python) invokes frame capture routines in C (again based on the NV system [7]), frame differences are also calculated in C for performance reasons, then several shell scripts are used for manipulating images with standard tools. Finally a web page is provided to display the results. This web page uses the "client pull" facilities of Netscape to cause the page to be automatically updated every few minutes. (This update is more visually disturbing than we would like, so a new embedded implementation in Java is being considered.)

An earlier version of this system was used by the authors in early 1995 when we were separated by several thousand miles and connected only by the web due to restrictions imposed by a firewall. Although only employed in one direction because of the firewall, we still found the system effective in maintaining increased general awareness, and also in avoiding "phone tag". For example, in one instance, Hudson (the receiver of the information) was reminded to check his voice mail after returning to his desk and seeing an image of Smith having tried to call him. Similarly, rounds of "phone tag" were avoided because Hudson could easily determine whether Smith could currently be reached.

Conclusions

In this paper a fundamental dual tradeoff between awareness and privacy, and between awareness and disruption or resource utilization has been introduced and discussed. Simply stated, this tradeoff involves the information sent or received. The more information sent by a person the more their co-workers can be aware of them. However, the more information one sends, the greater effect this can have on one's privacy. Similarly, the more information one receives about others, the more aware one can be of them. However, this information then also has greater potential for disruption of "real work", either by direct interruption, or by consuming resources needed elsewhere.

We believe this tradeoff is fundamental and in some sense unavoidable. However, by viewing awareness problems through the lens of this tradeoff space - in particular, by carefully examining the nature of the information being transmitted and received with respect to this tradeoff - it is possible find good tradeoff points which provide awareness, while still providing good privacy or disruption properties. This paper has illustrated this with four specific techniques. Each technique was designed to address one or both of the dual tradeoffs in some way, each was designed using this notion of "information analysis", and each illustrates that more fidelity and more bandwidth does not necessarily produce a better result.

Future Work

One important remaining challenge for the work presented here is to integrate the techniques described into a unified system. We are currently working to do this in a web-based framework using Java applets. This framework will allow components such as the ones described here to be quickly "plugged together" and presented within a set of web pages.

Acknowledgements

This work was supported in part by a grant from the Intel Corporation, and in part by the National Science Foundation under grants IRI-9500942 and CDA-9501637.

References

[1] Adler A., Henderson A., A Room of Our Own: Experiences From a Direct Office Share. In Proceedings Of ACM Conference On Computer Human Interaction, 1994, pp. 138-144.
[2] Bly S., Harrison S., Irwin S. Media Space. In Communications Of The ACM, 36(1), January 1993, pp. 28-47.
[3] Borning A., Travers M., Two Approaches To Casual Interaction Over Computer And Video Networks. In Proceedings Of ACM Conference On Computer Human Interaction, 1991, pp. 13-19.
[4] Dourish P, Bly S. Portholes: Supporting Awareness in a Distributed Work Group. In Proceedings Of ACM Conference On Computer Human Interaction, 1992, pp. 541-547.
[5] Fish R., Kraut R., Root R. Evaluating Video As A Technology For Informal Communication. In Proceedings Of ACM Conference On Computer Human Interaction, 1992, pp. 37-48.
[6] Fish R., Kraut R., Root R, Rice R., Video as an Architecture for Informal Communications, In Communications Of The ACM, 36(1), January 1993. pp. 48-61.
[7] Fredrick, R., Experiences with Real-Time Software Video Compression. In Proceedings of the Sixth International Workshop on Packet Video, Portland, OR, Sept. 26-27, 1994.
[8] Gaver W., Smith R., O'Shea T. Effective Sounds in Complex Systems: The ARKola Simulation. In Proceedings Of ACM Conference On Computer Human Interaction, 1991, pp. 85-90.
[9] Gaver W., Moran T., MacLean A., Lovstrand L., Dourish P., Carter K., Buxton W. Realizing A Video Environment: EuroPARC's RAVE System. In Proceedings Of ACM Conference On Computer Human Interaction, 1992, pp. 27-35.
[10] Gaver W. The Affordances of Media Spaces for Collaboration. In Proceedings of the ACM CSCW '92 Conference on Computer Supported Cooperative Work, 1992, pp. 17-24.
[11] Heath C., Luff P., Disembodied Conduct: Communication Through Video in a Multi-Media Office Environment. In Proceedings Of ACM Conference On Computer Human Interaction, 1991, pp. 99-103.
[12] Issacs E., Tang C. What Video Can And Can't Do For Collaboration: A Case Study. In Proceedings Of ACM Conference On Multimedia, 1993, pp. 199-206.
[13] Mantei M., Baecker R., Sellen A., Buxton S., Milligan T., Wellman B. Experiences In The Use Of A Media Space. In Proceedings Of ACM Conference On Computer Human Interaction, 1991, pp. 203-208.
[14] Pier K., Newman W., Redell D., Schmandt C., Theimer M., Want R. Locator Technology in Distributed Systems: The Active Badge. In Proceedings Of Conference on Organizational Systems, 1991, pp. 285-287.
[15] Root R. Design Of A Multi-media Vehicle For Social Browsing. In Proceedings Of ACM Conference On Computer Supported Cooperative Work, 1988, pp. 25-38.
[16] Smith I., Hudson S., Mynatt E., Selbie J. Applying Cryptographic Techniques To Problems In Media Space Security. In Proceedings Of ACM Conference On Organizational Computing Systems, August 1995.
[17] Smith, I., Hudson, S., Low Disturbance Audio for Awareness and Privacy in Media Space Applications. in Proceedings of ACM Multimedia '95, Nov. 1995.
[18] Tang, J., Rua M., Montage: Providing Teleproximity for Distributed Groups. In Proceedings Of Computer Human Interaction, 1994, pp. 37-43.
[19] Whittaker S., Frohlich D., Daly-Jones O. Informal Workspace Communication: What Is it Like And How Might We Support It. In Proceedings Of ACM Conference On Computer Human Interaction, 1994, pp. 131-137.