Annotated Bibliography

Ian E. Smith

Table of Contents


1. Article: Realizing A Video Environment: EuroParc's RAVE System

Author: William Gaver, Thomas Moran, Allan MacLean, Lennart Lovstrand, Paul Dourish, Kathleen Carter, William Buxton

Reference: CHI `92, p. 27-35

This paper is a broad discussion of the RAVE system's functionality, use, and design. They organize their efforts into three areas:

They provide an excellent breakdown of the axes for collaboration in the mediaspace:

The discuss the fact that in the absence of a mediaspace, people move fluidly from one any of these types of communication to any other.

The built there system around "buttons," which are basically user-configurable scripts for control of the resources. Five major button types emerged/evolved: background, sweep, glance, office share, Vphone. These reflect different intentions of use of the mediaspace. The background button is (roughly) the default (disconnected) state. It puts a common area on your monitor. The sweep gives you ~1 second peeks into some set of locales versus glance which is focused on one person for ~3 seconds. Office share and video phone (Vphone) are two-way A/V connections and differ only intended use (long term collaboration versus focussed discussion).

They initially dealt with the privacy problem by being "intentionally naive" and letting the social norms control privacy issues. They separate privacy into several issues:

They note a crucial tradeoff: control implies intrusion.

The built "Goddard" on top of "iiif" to control the resources and defeat iiif's control/ownership notions. It is what allows you to have control over your resources. The use Gaver's auditory cues to provide notification. They note that non-speech auditory cues can be tailored to be much less disruptive (gradual buildup in volume, auditory icons, etc).

They also built a time-based notification system called "Khronika" which provides selective awareness of planned and electronic events. It can be told to keep the user informed about "events," "daemons," and "notifications." Events are defined in terms of their class, their start time, and their duration (conferences, visitors, local movies, arriving email). Events are also "classified" to provide higher-level abstractions for selection. Daemons are programs that watch for specific things: they are a set of constraints that can fire a notification. They also mention here the Polyscope and Portholes systems. Polyscope provided a very limited "history" mechanism. They also use the awareness system (Polyscope) as a gateway to intentioned connection (vphone, glance, etc).

2. Article: MBONE: The Multicast Backbone

Author: Hans Ericksson

Reference: CACM, August `94 p. 54- 60

This a very "high-level" discussion of the MBONE, more geared towards its capabilities than how it works and/or why. They mention the previous version of encapsulation (LSRR: Loose Source and Record Route) versus the IP encapsulation used today. The explain TTLs and how thresholds are used with DVMRP. They mention that DVMRP doesn't propagate routing changes quickly, although they don't explain why. The mention the names of the common MBONE tools and their authors.

Almost all the examples are of the (thrice yearly) IETF transmissions. They note that the bandwidth consumed by IETF is 100 to 300 Kbps with spikes to 500Kbps. They mention (but don't explain) the RTP (Real Time Protocol) used by most (?) MBONE apps. They say that the cellular phones get the cost of an audio channel down to 18Kbps, including overhead. (Raw 8 bit, 8K audio plus control info on vat is said to be 75Kbps. )

On page 59 they have a reasonable table of the proposed thresholds for each type of media and transmission distance. They note that coding two things with one number is bad, but that `s the way it is. This also does not allow these two parameters to be controlled individually; if you want stuff from far away of one type you get all the stuff with smaller threshold.

They produce some interesting numbers for multicast packet forwarding: a Sparc 1+ forwards one packet in about 1.0 ms, a Sparc 10 in 0.6 ms. They mention that some people use the CPU speed of the machine forwarding packets as a bandwidth controller. The problem is that in the saturation case the (user level) mrouted is getting any cycles and its neighbors will "time it out" and stop sending data to the site. This gives the CPU a rest, lets the mrouted run again, and then starts the flood again yielding oscillation.

They explain that the LSRR option was bad because Van Jacobsen found that periodically (30 second intervals) you would get huge (65-85%) packet losses if you were transmitting video. During non-trouble periods, losses were around 0.5%. He suggested that the LSRR was competing with normal routing updates. They mention the "bogus ICMP" packet problems that are caused by "screaming" routers when confronted by multicast traffic.

3. Article: Multicast Routing Extensions for OSPF

Author: John Moy

Reference: CACM August `94, p. 61

This is basically a very detailed paper about MOSPF, which is the new proposal for extending (the standard) OSPF for routing multicast packets in a AS. An AS is an Autonomous system, which means a bunch of routers under a single administration and running common routing protocols. OSPF is the recommended routing protocol for th TCP/IP internet.

The basic scoop is that they base their routing choices on both the source and destination. (Regular OSPF uses only destination.) Normal OSPF can just use Dijstrka's algorithm to calculate routing; they have to calculate a form of the minimal spanning tree (since they have to worry about both sources and destinations) called the Steiner tree.

They allow IP datagrams to be labelled with a TOS (Type of Service) parameter: minimize delay, maximize throughput, maximize reliability, minimize monetary cost, and normal service. They allow different routes based on this. Further, they can optimize because they can do "expanding ring search" which basically just a set of pings with succesively larger TTLs. When an MOSPF router sees an IP packet with a given TTL it can figure out that the TTL is too small to reach another group member and not bother forwarding it. They do not split the multicast stream across two equal cost channels.

They go into a significant amount of detail about the protocol necessary to make this all work. The upshot is that the routers have complete knowledge of the internetwork and can do smart things. They also provide some mechanisms to help the routers keep their tables smaller (coalescing routes). They compare MOSPF (on parts of the internet) to DVMRP with the following advantages claimed: Increased stability, agregration of sources to make global DVMRP tables smaller, slightly more efficient pruning, and optimization of the IP multicasting "ring search." They also do an analysis of the algorithm; they claim that only the cost of the Dijkstra's needs to be considered, and that number of these can be reduced significantly by using some wildcards and default routes. (They claim a 200 router area router area implies 200 Dijkstra's which can be done in 3 seconds on a 10 MIPS processor.)

The MOSPF protocol doesn't really do much about what to do across AS boundaries. They introduce the "some routers get all the packets" trick to get around this problem. Inside the AS, all the routers know the whole state, so there is no problem, but across the boundaries little (or nothing) is known. I'm concerned by this, although I can't put my finger on why.

4. Article: Speech Patterns In Video-Mediated Speech

Author: Abigail J. Sellen

Reference: CHI 92, p. 49-59

This is a psychology paper and experiment. She constructs an experiment that compares the speech patterns of 4 person groups (she used informal debates) when they meet face-to-face, over a 4-pip, and via a system called "hydra." Hydra is a spatial array of monitors, cameras, microhpones, and speakers. It assures that the spatial layout of people around the virtual table is consistant, and that the origin of the speech signal is the correct speaker. The claim is that hydra should provide better cues since gaze and gaze awareness are conveyed better by the spatially arrayed monitors. Further, since the audio streams are separated, it should be that a user can concentrate on them independantly.

She notes that the video-mediated conversation has been studied in the past and that the video-mediated conversations tend to be more "orderly" than their face-to-face versions. The previous results on number of interruptions and pauses are inconsistant (although most involved studies of pairs). The claim is that perhaps listeners are less likely to attempt to seize the floor when using the video because the visual cues are less present. She notes that Rutter has shown that audio-only conversations are less spontaneous, more formal, and more socially distant than face-to-face discussions.

Here hypotheses were:

Results showed that:

5. Article: Evaluating Video As A Technology for Informal Communication

Author: Robert S. Fish, Robert E. Kraut, Robert W. Root

Reference: CHI 92 p. 37-48

This is yet another Cruiser paper. They identify why a visual channel is "a good thing" based on prior literature. They claim it

  1. is helpful in increasing the sponteneity and frequency of communication
    is helpful in supporting social relationships
    is helpful in coping with the most complex and equivocal communications problems encountered in workgroups
    is helpful in integrating members into and supporting the work in research and development groups.
The did a four week field trial with 11 summer students and 12 (of their) mentors/supervisors. They tried to increase the opportunities for conversation and thus increase the number of spontaneous conversations. The things a user could do were:

The privacy features were enforced reciprocity (bogus) and a "block people out, I'm busy" feature. (This does not strike me as fine grain enough to express the types of behaviors I want.)

The observed a use not to different from the telephone. However, about 1/4 of the calls seemed to be users monitoring their environment. The noted that about 1/4 of the Glances were first thing in the morning, right after lunch, and on weekends when it was uncertain if users would be in (and perhaps before the calling user was absorbed in their work).

The observed (duh) two novel uses of Cruiser: officeshare and "ambush." Ambush means that you connect to an empty office and then go about your business, waiting for that person to come in. Cruiser interactions involved more greeting and scheduling, but less problem solving and decision making. They attribute this to a lack of shared objects, tools, etc. The students would use Cruiser to see if the mentor was available for assistance and perhaps have a question or two and/or schedule a meeting. Cruiser was frequently used to inquire about status.

They did some questionnaire based studies on the users perceptions of different medias' effectiveness for various topics. Cruiser was most closely related to the telephone.

After 4 weeks, subjects were asked about privacy issues. Most didn't think there was a problem in their small, collaborative community. 4 of the 23 didn't want strangers looking into their office. Some analysis yielded that people felt that a conversation initiated by someone else via Cruiser was more of a violation than one initiated face-to-face. They were concered about field-of-view privacy problems in the Crusier system, as well as the hands-free audio.

AutoCruises (system initiated) were accepted only 3% of the time, versus 54% with user-initiated calls. Sounds like differences in whether you'll talk to someone when you pass them in the hall. They considered the short length of calls (especially all the status checking) as proof that the system was not as expressive as other media. The low completion of autocruises plus the privacy invasion and instrusion factor led them to conclude that they hadn't produced a system with enough power in the visual channel to handle all the human to human protocol.

Their conclusions revolve around the idea that Cruiser is too heavyweight, and they're right. The system requires attention to make things happen and it is obtrusive. One of their users said, "There is no halfway with Cruiser." They also complained about the lack of shared objects. They discuss that when the telephone was introduced in England, it took a while for the social norms to develop about how the device should be used; they suggest the same might be true of Cruiser. They continue the telephone analogy to the bitter end.

They suggest that they need a new system that balances "accessability, privacy, and solitude." Accessability is the ability to get to someone easily. Privacy is their word for control of what information is available. Solitude is the intrusion factor.

6. Article: TeamWorkStation: Towards a Seamless Shared Workspace

Author: Hiroshi Ishii

Reference: CSCW `90, p.13-26

This is a paper about reducing cognitive seams. They merge the computer and the desktop to create a shared virtual shared workspace. They claim that since you can continue to use his/her favorite apps or manual tools in the shared workspace, the congnitive discontinuity (seam) is reduced.

He claims that the two big areas for real-time shared workspaces are shared window systems and mediaspaces. They note that first handles only things in computer and the second things outside the computer. He notes that in conventional apps the seams are starting to be overcome by cut/paste/clipboards and uniform interfaces. He mentions three seems in CSCW apps:

  1. The seam between individual work and cooperative work modes.
    The seam between computer-supported work and non-computer-supported work.
    The seam between synchronous and asynchronous communication and real-time communication.
He uses these seems to divide the work world into six pieces:

Computers and networks are currently removing the A<->B seam. Shared window systems can do something about A<->C, but can only handle data in the computer; similarly, teleconferencing can't readily handle the data computers. He claims that A<->A' is not something you can do much about, but that all group memebers must be able to make independant choices about their transitions from A to A' and vice versa. They are trying to fuse A, A', and C.

Their basic trick is to overlay individual workspace images. They do this with overlays from CCD cameras as well as from the computer. They have a face camera, a camera pointed at the desk, a private monitor, and a shared monitor. Its a Mac system and so you can just drag apps from the shared to the private monitor as needed. They can merge the desktop and the video image. They can also screen share (with control, via the technique that Timbuktu uses) and share drawing surfaces. The provide an example use of the TWS workstation to help teach calligraphy. Seems a little biased.

He mentions four real time sharing alternatives and examples of each:

Problems in their TWS system: Same as John and Scott, can't share the results directly. Quality of the overlaid video image was not good. Indirect drawing and pointed needs effort to get used to. Identifier owners of objects with multiple overlays is hard. Overlaid images don't understand scrolling (duh). Video is machine-unreadable; they claim this is not a problem because humans can. TWS is strict ("Stefikly speaking") WYSIWIS and should probably be relaxed in time, space, population and congruence.

They found that people in the same room liked having the TWS' windows mirror the spatial layout. (They added this later.) It does this by understanding the floor-plan of the office. It only does this for the face windows, not the other displays.

7. Article: The VideoWindow System in Informal Communications

Author: Robert S. Fish, Robert E. Kraut, Barbara L. Chalfonte

Reference: CSCW `90, p. 1-11

This paper is about an experiment with a video wall installed in two parts of the same laboratory.

They discuss several of the aspects that comprise informal communication. They point out that informal communication (due to excellent feedback) is better at handling situations where participants are elaborating or modifying what they are saying to deal with someone elses objects or misunderstandings. They classify the formality of communication with this diagram:

Deft and Lengel classified rich communication channels as ones that can overcome different frames of reference or clarify ambiguity to change understanding in a timely manner. (Ordering: 1) face-to-face 2) telephone 3) personal documents (letters) 4) impersonal documents 5) numeric documents.) The authors added bandwidth and sponteneity.

They identify some points that are important to informal communication:

They video window they built was a monitor that was 8 feet wide by 3 feet tall. It also had directional microphones and speakers. They left it on 24/7 in their lab for three months. They offered free coffee in an area visible to window, and 50 people volunteered to have their mailboxes moved so they would pick up their mail in the vicinity of the window.

Results: about 10% of the possible conversations opportuntities were actually converted into conversations. In the same study about 41% of face-to-face opportunities resulted in conversation. Problems:

8. Article: Distributed Multiparty Desktop Conferencing System: MERMAID

Author: Kazuo Watabe, Shiro Sakata, Kazutoshi Maeno, Hideyuki Fukuoka, Toyoko Ohmori

Reference: CSCW `90, p. 27-38

This is a very strange paper. Its basically about a groupware technology, a desktop conferencing system. They system includes voice, video, shared windows (whiteboard), and a little conference managment. (The transport is narrowband ISDN.) They appear to also have a MM document editor, but it is not a shared application. They support the obvious four floor passing modes (designation, FCFS, baton-passing, free for all) and they make the obvious connection to social situations.

They do have a breakdown of CSCW apps based on three axes: distribution in space, distribution in time, and individual versus group support. They are handling (obviously) different place, same time, and group work.

9. Article: Portholes: Supporting Awareness in a Distributed WorkGroup

Author: Paul Dourish, Sara Bly

Reference: CHI `92, p. 541-547

This is a good paper about awareness. The Portholes system is a distributed (PARC and EuroPARC) information service, which grabs frames automatically from cameras in mediaspace nodes. The data is sent to each side (replicated) so it can be displayed as a panel of frames. Some additional information is also available such as time the frame was grabbed, email address, and an audio snippet (recorded by the viewee for broadcast purposes).

The architecture is such that clients only talk to local servers (performance) and the servers communicate with each other. The servers exchange the inter-domain information this way; they process their own domain. The servers are not particularly concerned with what client programs may do with the source information (images) or properties (everything else).

The interface is fairly space-intensive and they (correctly)note that this is a detriment to casual use (one of their users specifically mentioned this). They make several mentions of serendipity in Portholes use: This is seems crucial to mediaspace success to me, and is really tied to the digital world. Its harder to get serendipity with limited analog resources. They also mention that users have less motivation for use (via an example) if they "aren't guaranteed of seeing anything." This seems to motivate automatic processing based on "interest functions." They make a point of diving into the community building effects of Portholes. They mention that it is a place for both the serious and the whimsical. I wonder how important that "both" really is.

The bring up some issues for future work:

10. Article: Spatial Workspace Collaboration: A SharedView Video Support System For Remote Collaboration

Author: Hideaki Kuzuoka

Reference: CHI `92, p. 533-540

This paper is interesting for its impact on 3D CSCW. They are interested in sharing views and remote collaboration on something like a machine-shop floor, where things like "focal points" really matters.

They do this in the context of instructors explaining how to use machine-shop equipment. The noted the instructor's actions to be: find object, express, confirm. The operator's (student's) actions to be: find object, understand, manipulate, respond. The focal point of the instructor changed every few seconds and may change even twice in one second. To support their application you need movable 3D focal points so the instructor can see/show. The noted the 3D expressions of the instructor and student: position, motion/manipulation, and confirmation.

They devised a model task and performed an experiment. The variables were having the instructor present (face-to-face) or remote and whether or not the instructor could use gestures. They do an analysis of the parts of speech used in the discussion so they can understand what the users were talking about. They also bring up a lot of good points about the effects of camera position and orientation. The discussion of remote objects (duh) gets hard when you can't share point of focus.

They use the results to design a communication system for remote 3D collaboration. Requirements:

A keen observation was that the gaze angle (angle between eyes and body) was roughly proportional to head angle (angle between head and body), allowing easy gaze estimation. They constructed a gaze tracking camera and a head mounted display via a helmet.

They did an experiment with face-to-face and two remote cases. The two remote cases were between a fixed camera view and a SharedView camera. The SharedView camera decreased discussion of directions and orientations.

11. Article: The Active Badge System

Author: Andy Hopper, Andy Harter, Tom Blackie

Reference: Interchi `93, p. 533-534

This is CHI short paper about the active badge system at Olivetti (Europe). They say that their badges have two LEDs, two buttons, and two buttons for interaction. They also mention that they are light sensitive so as to decrease power consumption in the dark; the upshot is that you can turn it off by putting it in your pocket or turning it over. They describe some simple apps (BirdDog, phone routing). One interesting app they mention is a video mail integration system, which allows annotion on the video stream based on badge data (who is present, etc). They also mention that they are putting active badges on other I/O devices such as phones, cameras, and mics, and attaching them in such a way that you only get transmissions when the device is on.

12. Article: One Is Not Enough: Multiple Views In A Media Space

Author: William Gaver, Abigail Sellen, Christian Heath, Paul Luff

Reference: Interchi `93, p. 335-341

This is another psychology experiment paper from A. Sellen w.r.t. mediaspaces. This one is (in short) making the claim that the "head-n-shoulders" view in a mediaspace isn't necessarily what you want. (Duh). They construct two tasks and give the users the ability to use one of four cameras in the "remote" room while performing the tasks. They don't say that face-to-face is useless, just that it isn't flexible enough (again, duh).

They make the point that the face-to-face view does not allow for many types of signals/communication that naturally occur. Further, the difficulty of adjusting the camera insures that the camera view is relatively fixed. The really good point they are making that they just barely mention is that artifacts/information must be actively presented to colleagues.

They constructed two rooms, each with four cameras. The rooms had three cameras identically placed (face to face, side view of desk for context, desk view for shared documents), the fourth was different in the two rooms (birds-eye view in one, and a dollhouse view for the other, see below). The first task was for each participant to draw the others room. (This seems very contrived to me; this just cries out for a more `exploratory' view and manipulation technology.) The second task was to arrange some furniture in a dollhouse; the participants were secretly given different and conflicting goals for the task.

Their results showed that the face-to-face view was rarely used. Their analysis indicates that if people have a choice between face-to-face and views that give access to shared work objects, they'll choose the view that allows shared work. They mention that people used the face-to-face (for short periods of time) to seemingly assess each other's mood and engagement. They suggest that their data may under-emphasize the importance of glances. They also say their data shows that particular views were not predictably associated with any task, but rather varied based on the current task. Humans are clever at using tools.

They discovered some problems with using a multi-viewed system:

In their conclusion they mention repeatedly that a chief difficulty of their setup was the discontinuity of the views exacerbated by the switching between the views. They mention that Gaver's affordances work suggests that continuous motion would be better; they mention the Delft work that Gaver is involved in.

13. Article: Experiences In The Use Of A Media Space

Author: Marilyn Mantei, Ronald Baecker, Abigail J. Sellen, William S. Buxton, Thomas Milligan, Barry Wellman

Reference: CHI (?) `91, p. 203-208

Thisis the cavecat paper. They observe their experiences in a four-way mediaspace: two faculty, one programmer, and a lab for grad students. They built there system out of a four PIP and the IIIF control software (its all analog).

They use an icon/drag based control system. You drag an icon into a virtual office (on screen) and it connects the people inside the office.

The noted some technological obstacles:

They mention several places that their were problems when more than on participant was in one office. This yielded side conversations (between people in the same room) and the public discussion had trouble expressing dominance when appropriate. The multiple people also presented a challenge because if any image of the remote participants was present, it was smaller and made the participant seem less real. This did not allow the normal conversation cues to come through.

They note that eye-gaze was very difficult to establish in the mediaspace, despite its importance. They mention work by some psychology guys about the use of eye-gaze in converstation and claim this work says that eye-gaze has at least five functions in conversation (regulate flow, provide feedback on perception, emotions, nature of relationship, reflect status relationships). The mention the EurPARC (?) idea of video tunnels to help with the eye-gaze problem.

They talk about the fact that the status of the participants is not reflected in the mediaspace display and that their implementation may move people around after a break (arbitrary ordering). They claim this was highly disconcerting to the participants. I don't believe this; if people change seats in a meeting after a break, I can deal with it.

They make the usual observation about the lack of conversational cues causes more of a need for turn-taking and/or a moderator. They claim that video image size had an impact on the conversation. Participants with large images appeared to have more of an impact on the conversation. They bring in some work by pyschologists on "social" distance and discuss the three levels of social distance (4-12 feet for strangers, 1.5-4 feet for friends, and <1.5 feet for intimate friends) in conversation. They relate the image size to this.

On the privacy side, they mention that you need to know when you office has someone in it, and you need a way to control it. They mention that the IIIF system was too complicated for easy use. They also reference Gaver & Smith's feedback via audio cues as a good idea for informing you of what is happening.

They are trying to develop metaphors for privacy/communication built on already existing (human) communication practices. (Shut one's door, wait to see someone who is busy talking to someone else, etc). They are trying to build a visual lange for manipulating the parameters of the system. They are also trying some automatic switching stuff based on who the current speaker is. This won't work if they have a 2-second switching delay.

14. Article: Videowhiteboard: Video Shadows To Support Remote Collaboration

Author: John C. Tang, Scott L. Minneman

Reference: CHI `91, p. 315-322

This paper is about a video whiteboard which was a sequel to their work on Video Draw (which seems like a follow up to Commune). The system works by using a camera and video projector behind a surface (4.5' x 6') that a person stands in front of. The person can make marks and the marks with dry erase markers and the shadow of the person are sent to the remote site, with a similar setup. They do correct the left-right reversal, so people share the directions. A user sees the composite of his marks, the remote marks, and the remote shadow.

The mention the video draw system which was a similar setup but with monitors you could draw on and pictures of the hands. A big problem was that you were in the way of the camera and if you weren't, it created a parallax problem.

Good point: this allows people to actually work in the same place on the whiteboard if they are at different sites; this is not possible without videowhiteboard, as your bodies get in the way. People using the system felt like the collaborator was on the other side of the screen; they say this is not a big problem and that VW evokes the correct cues from people when they collaborate. They mention that the collaborator is superimposed on the drawing surface, thus avoiding divisions of attention between speaker and marks that occur with a real whiteboard.

Problems: No eye contact. Shadows dont' distinguish between people. There is no feedback, and so users can't tell if their (subtle) gestures or movements are perceived. They had problems because their resolution was only 330x240 and the optical alignment was worse towards the edges of the screen. Users can only erase their own marks.

15. Article: Disembodied Conduct: Communication Through Video In A Multi-Media Office Environment

Author: Christian Heath, Paul Luff

Reference: CHI `91, p. 99-103

This is really a paper about the sociology and psychology of gestures and how they are affect by mediaspaces. They reference two studies (Gale and Chapanis) which suggest that video connections do not affect the performance of work activity. Others (Smith, O'Shea, O'Malley, and Taylor) claim that eye-contact and gesture help people divide tasks between them (based on the SharedARK work). In their background, they have a reasonable discussion of how speakers use gesture and gaze to coordinate speaking.

They claim that at EuroPARC people tend to establish video contact before engaging in conversation. This can start upgrading as their attempts to go unnoticed until they are gesturing wildly in an attempt to gain the recipients attention. Sometimes they just give up and buzz off, and other times they drop back to an audio connection and announce themselves.

They really discuss how the lack of the ability to perform gestures successfully is a problem to conversation. People use gestures to attract the listeners attention at points, as well give illustrations, etc. They speculate that since gaze is also difficult people have only vocal cues to help the get co-participation. (They give as an example people giving up on utterances because the recipient was giving the feedback to the speaker that he should.)

They make a big deal out of the fact that a video monitor is only a small piece of your field of vision, and thus the peripheral gestures are going to be distorted. Simarly the speakers access the recipient is very limited by the screen size.

They bring out some of the asymetries of collaboration and try to justify why these might be good. (I though these were wildly obvious.) They note that you can be connected with others but confine the disturbance to your mediaspace. They also note that it can be effective as a collaboration tool if you work at it (conversationwise).

16. Article: Sound Support For Collaboration

Author: Bill Gaver

Reference: ECSCW `91 (p. ?)

This paper is a review of the ARKola simulation results and the EAR system. This paper is basically about using sound without any visual display to give you information without obtrusiveness. This paper claims that awareness is the background to other types of more focussed collaboration. He gives an axis varying from serendipitous communication to division of labor to foccussed collaboration. He notes that computer systems for collaboration tend to vary in the amount of control over shared objects. He also notes that full-time audio/video is expensive in terms of attention and/or screen real estate. He says that audio can provide the context for moving between types of collaboration.

He talks about everyday listening and how well-designed auditory icons can help you monitor activities and events and thus provide a basis for collaboration. He reviews the ARKola setup and task. The most interesting result here is that people divided the labor up between them, but used the audio icons to help the monitor what they could not see (or were not focussed on). They also noted that this reduces the risk of adventuring, as you will hear if things go wrong somewhere else. In total, he says that it reduces the difficulty of transition between division of labor and focussed collaboration.

He reviews the EAR work which is basically environmental sounds that give people reminders in their offices of events. The sounds are tailored to be unobtrusive and not come up quickly so as to startle. They use the Khronika event database to tell them when to play the sounds. They use it to support awareness of events that might normally not be perceived (e.g. meetings starting somewhere in the building). He mentions the beginnings of the audio cues that are used in the RAVE system: the creaking open door for someone looking in, the closing door for them leaving, etc.

In both cases the principle is that users can be aware of the sounds even if they visually were not attending to them.

17. Article: Two Approaches To Casual Interaction Over Computer and Video Networks

Author: Alan Borning, Michael Travers

Reference: CHI `91, p. 13-19

This is the polyscope and vrooms paper. These two guys were visiting EuroPARC when they hacked this stuff up. They talk about privacy early on and bring out two "principles:"

The polyscope system is basically a grid of framegrabbed images from the offices of the researchers. They connect the bitmaps to their other system services (glance, vphone, etc) and you can get to these services by clicking on the bitmapped. They seem impressed with their "simple animation" which plays back previously frame-grabbed images; they note that users were very interested in this at first and then interested wanted but didn't go away.

As far a privacy in polyscope the options are no information, short text message, manual video (captured when the user clicks), or automatic video (captured once a minute). For feedback they havew none, names only (who's looking at you), and video (video images of observers). For symmetry they have yes and no: if you set it to yes, then they will not give out video to those not giving out their video. In actual use, 77% of the time the users used no feedback, and the symmetry was almost never on. Actual users commented that although symmetry was good in the abstract, in practice it didn't matter. They (correctly) note that the problem with polyscope was the users were forced to make explicit choices about their accessibility and visibility. This yields what I call the "fluidity" problem with accessability and privacy. It needs to be a continuous variable, and should be easily changeable. They also noted that symmetry is partially qualitative; the view from the cameras vary in terms of their information content. They claim that people feel more strongly about symmetry in the full-motion case; I don't think this is clear. They did say that their system didn't solve the symmetry problem and it may be because it just plain doesn't matter.

Vrooms just lays the same type of stuff onto a spatial metaphor; worse, the spatial metaphor breaks as you can be in multiple rooms at once. All the users in a room see the same view; the view includes images of all the users in the room. Rooms can be created/deleted on the fly by users. They claim that the symmetry constraint is automatically satisfied by the room metaphor (I don't think this is a solution, its changing the problem). Nifty interaction hack: if you move your image near the image of someone else you get a heavy box around the two and if you let go, you'll get a two-way audio-video connection to them.

They make a good observation that the view in a mediaspace like theirs is "gods-eye" and not embedded in yourself as it is normally. They ask the question if this will break the ideas about social space that people normally have. They consider the possibility of "doors" and "hallways" (ala Cruiser) to extend their spatial metaphor. The suggest (rightly) that polyscope is a common vroom.

18. Article: Design of a Multi-media Vehicle for Social Browsing

Author: Robert Root

Reference: CSCW `88, pp. 25-38

This is the first cruiser paper; before they had actually built the system. It is fairly "pie in the sky." The first bit is a discussion of previous work in the area of why people need to talk to one another (both formally and informally) at work. They say that the thesis of the paper is "...a major consequence of geographical separation is imparment of workers' ability to effectively browse the social environment, resulting in a significant and consequential reduction in the frequency of unplanned interactions."

They basically try to create a virtual workspace, with a hallway metaphor. The hallway is a sequence of nodes in virtual space, which correspond to people's offices, common areas, etc. If you "encounter" someone in one of these areas, you can begin conversing. For convience, the hallway is circular. They have three basic movements in the virtual hallway: jump, planned path, random walk. Jump is a video phone call. The other are the ways to get to destinations in Cruiser; the first is when the user plans the route, the second is when the system generates one on the fly. They claim this is their mechanism for unplanned interaction. (Seems highly questionable to me... how do you if you see someone? How long does a walk take (per node)?) They are forced to create new nodes whenever people are actually engaged in a conversation, so you can stumble in later (called conversation nodes). Note that as you pass an office, you see into the office, and occupant sees you "out in the hallway" which is really an image from your camera. Cruisers are always announced. This seems very high on the interruption scale.

They spend quite a while discussion interruption protocols in the real world. They suggest that there is an "availability" factor that we discern from the situation, the people present, and the snippets of audio we might overhear. They also mention that people also use explicit signals (such as door closings) to make clear their availability. They move this into Cruiser with the notion of blinds. If the blinds are open the cruiser can see in, if they are closed he cannot. They also talk about half-open or partially-open blinds giving partial information. They (again) enforce reciprocality... if you are not allowing access to your video you can't get access to others.

They spend the rest of the paper talking about the integration of the Cruiser system into the workplace, and it sounds very "Intermezzoish." They talk about things like: Cruiser automatically generating random walks to the printer node if you print a document. They also talk about automatically generating cruises based on your current task. This would be something like the system restricting your "hallway" to include only those nodes which include your coauthors on a paper, setting your availability to high for your coauthors and low for everyone else, etc. They also talk about the reverse: If you cruise to someone, the system invokes the shared editor on the document you are working on. In general, all of this stuff is doable, just would talk an immense amount of work.

19. Article: The Affordances of Media Spaces for Collaboration

Author: William W. Gaver

Reference: CSCW `92, pp. 17-24

This paper is basically a list of the affordances of various aspects of mediaspaces and suggestions for how they could influence the design of such systems. He starts with a brief introduction to "Gibsonian" affordances and then begins his laundry list:

20. Article: The Workaday World As A Paradigm for CSCW Design

Author: Thomas P. Moran and R.J. Anderson

Reference: CSCW `90, pp. 381-393

This is a theory paper about paradigms for the design of CSCW systems. They bring out the three major paradigms in CSCW design:

They mention that although both of the last two seek to make the technology transparent, in reality most of the work is technology centered. They spend the whole second section of the paper talking about the relationships of philosophical psychologiesto the scientific disciplines used in design.

The use phenomenology (philosophical psychology) to create a paradigm called the workaday world. This is basically the idea people's everyday, mundane, relationships, and resources (technology included) should be the basis for design. They highlight three aspects of the workaday world, technology, sociality, and work practice. These three are tightly linked, and can't be separated.

They make a wonderful argument that not only are the formal and the informal separate, but they flow fluidly back and forth. Although this idea is not new to them, they do highlight the important point that almost all CSCW systems built so far have addressed only one of these two. They then begin to try to to highlight the workaday world paradigm as de-emphasizing technology (making it "mundane"). They also make sure to beat on the fact that it should not require attention. ("Invisibility of ubiquity and invisibility of non-attendance.")

They define four axes of social interaction (see RAVE paper also):

They basically discuss the RAVE system briefly: Different types of connection, environmental audio, and the tailorable interface (buttons). The conclude by trying to ask the question of "Is it possible to re-orient CSCW design research so that the technologies do not occupy center stage?"