The Sonopticon Project Human-Computer Interaction
CS6751 Fall 1997
EVALUATION RESULTS | OVERALL EVALUATION AND RECOMMENDATIONS | CRITIQUE OF EVALUATION PLAN | APPENDIX

Part I: Defining the Project
Part II: Initial Prototype and Evaluation Plan
Part III: Evaluation of Shopping! Prototype
  Follow Me's Evaluation Presentation
  Final Project Presentation

Sonopticon Logo 
"It sounds like 20/20!"
Follow Me Evaluation Team
Jason Elliott
jlelliot@cc.gatech.edu 

Kevin Scott
kcscott@cc.gatech.edu

Alexandre Stoychev
saho@cc.gatech.edu 

Rawesak Tanawongsuwan
tee@cc.gatech.edu

Team Mailbox

 
Evaluation Results top

Heuristic Evaluation

During the heuristic evaluation, the evaluation team critiqued the Sonopticon system based on three general usability principles or guidelines that were provided for the exercise. Aspects of the system which violated or successfully implemented those guidelines were noted by each evaluator. Afterwards, all of the evaluators discussed the issues found individually and ranked them by severity and impact on the usability and effectiveness of the prototype.

The three guidelines that were used are:

  • Simple and natural dialogue
  • Recognition rather than recall
  • Robustness

The following is a ranked summary of the comments that were made for each of the usability guidelines listed above:

Simple and Natural Dialogue

  • Audio cues are not as informational as they need to be. They should be directional and they should indicate the distance to the object in question.
  • The text is difficult to read.
  • Complex visuals don't give the user any extra information, but distract them from driving tasks.
  • The graphical information tends to give the user too much information, distracting his attention from the task of driving.
  • Simplified voice cues may work better than beeps and visual cues.
  • Repeated warnings may become annoying and distracting.

Recognition rather than Recall

  • Need more distinguishing characteristics on the vehicles.
  • The icon for noise cancellation is not appropriate.
  • The reticle or crosshairs symbol is more indicative of a target, or something the user is aiming for and trying to hit, rather than something the user is trying to avoid.

Robustness

  • The user should be able to control the warnings so that slow passing vehicles don't become annoying.
  • Users should have control over the noise cancellation system.


Think Aloud

On Sunday, November 23, all four members of the FollowMe team, using themselves as subjects, participated in a Think Aloud exercise. The object of the exercise was to gain additional exposure to the Sonopticon project prototype. The evaluators each rode in a Sonopticon equipped vehicle and were given a short driving tour of its features. Initial responses to the experience were recorded on tape, and then again via a short interview after each evaluator had completed the tour. Those responses are available in the Appendix.

The overall impression of the evaluators was of a positive evaluation experience. The Sonopticon concept proved to be engaging and useful. While the evaluators acknowledged the "rough around the edges" nature of the prototype, they were generally pleased.

The Think Aloud process brought about a number of questions and observations regarding both the prototypical implementation of the Sonopticon product, and it's conceptual design. One of the most important issues raised was appropriateness, customizability, and form of the aural and visual signals. In other words, the evaluators discussed, at length, whether the signals that they experienced were appropriate. Were they loud enough, soft enough, meaningful, easy to read, etc. It seemed as if there was some variability in how the prototype was operated, which might account for the differing opinions among the evaluators.


Questionnaire

The questionnaire evaluation technique aims to give a measurement of how well the Sonopticon system has been designed. Since the prototype is intended to enhance the visual and audio cues to a driver, the questionnaire focuses on three high-level concepts: the visual enhancement, the audio enhancement, and the usefulness of the overall system from a user's perspective.

The questions in the questionnaire relate to those three key concepts:

  1. Visual enhancement
    • The location of the visual information displayed was distracting
    • The visually displayed information was annoying.
    • The visual information displayed was easily legible
  2. Audio enhancement
    • The level of the auditory signals adequate. (Could you here the signals clearly?)
    • The tone of the auditory signals was adequate. (Was the tone of the signal easy for you to detect?)
    • The auditory signals announcing that an object in my blind spot was distracting.
    • The auditory displayed information was annoying.
  3. Usefulness of the system
    • The system was helpful in aiding to your awareness of your driving environment.
    • If reasonably affordable, I would purchase this system for my car.

Summary of the results from the questionnaire:

  1. The visual display concept is good. However, the way that the information is presented (for example, the location and the resolution of text) is found to be distracting. There is one special case where one evaluator says that the location of the visual information displayed is not distracting. However, we found out from his background that he has not had driving experience.
  2. Some audio signals seem to have more problems. The results show that some of the audio signals are annoying. Audio features should vary from person to person. The system should offer more flexibility in terms of changing the preferences for each user.
  3. The results indicate that this system is helpful in increasing the user's awareness of the driving environment.

 
Overall Evaluation and Recommendations  top

The design team generated three essential questions as the basis for their evaluation plan:

  1. Do the audio and visual cues distract the driver?
  2. Will users find the signals (audio and visual) annoying?
  3. Driver Response. Does the system provide feedback quickly enough to allow the user to process the information and act within a given time interval. (How long does it take for users to process the information provided?)

The findings of the evaluation team should therefore address these questions. While none of the evaluation exercises provided strictly quantitative answers, there were a number of themes that became apparent during the evaluation process.

Two of the most pronounced deficiencies that the evaluation uncovered involved the presentation of the visual and aural cues. While most of these problems were directly related to the prototype implementation, there are a few that might have been addressed more effectively.

The aural cues presented online bore little resemblance to those experienced during the Think Aloud exercise. The evaluators greatly preferred the former while the latter were, at times, annoying and seemed to denote emergency situations at inappropriate times (entering a highway, for example). In addition, the aural cues given during the think aloud were indistinguishable - that is, they all sounded exactly alike.

The visual presentation might have been improved through the use of simpler characters (i.e. a sans serif font) and potentially through the use of more thematically appropriate icons. Our driving experience includes a rich collection of "highway signage" that could have been put to better use in the Sonopticon system.

Finally, none of the evaluation techniques adequately addressed the designers' concern for reaction time. It may be possible to glean some information from the video recording of the think aloud exercise, but that information is somewhat tainted by latency problems present in the prototype.

In short, to answer the design team's questions:

  1. Yes, the visual and aural cues were distracting at times - and often startling.
  2. Yes, the visual and aural cues were sometimes annoying.
  3. There is insufficient data to answer question three due to the nature of the prototype. (Signal latency problems)


The Evaluation team has developed a few specific recommendations that might be used to improve the prototype (especially that which was used during the think aloud exercise). Note that each of these suggestions should be within the realm of the technological capability of the prototype instruments.

  1. Turn on the "Active Noise Cancellation" feature and leave it on for the duration of the driving experience. An alternative would be to have a user operable switch.
  2. Utilize more commonly recognizable icons where applicable (i.e. highway signs).
  3. Provide a wider variety of aural cues and be sure that each cue is used appropriately.
  4. Ensure that each visual is as clear and succinct as possible. Avoid textual redundancies.
  5. When appropriate, utilize motion within the graphical presentation. It is important to be sure that the motion is simple and non-intrusive, however - just enough to quickly draw attention to the visual disply and communicate the desired information.

 
Critique of Evaluation Plan  top

In general the evaluation plan was well defined. However there are some things that we would like to mention.

Heuristic Evaluation:

The heuristic evaluation was very clearly defined. We had no problems doing this part of the evaluation. The web-videos were very well done and certainly helped the evaluation process a lot. As a result we came up with a bunch of usability problems that could have been difficult to find otherwise.

Think Aloud:

The 'think aloud' did not go as smoothly as intended because there were some limiting factors. The test subjects were very isolated from the outside world because of the HUD and the earplug speakers. To further complicate things, the "radio" was on all of the time and it was very hard for the evaluators to speak when they could hardly hear their voice. The design team was sitting in the back seat and it was hard for them to hear what the evaluators were saying as well.

Under these conditions, doing a 'think aloud' did not make much sense. We think that a 'cooperative evaluation' would be the right evaluation technique to use in this case. The difference between the two is minor, but in cooperative evaluations, users are asked to critique the system. In essence, this is what some of the evaluators did after they tested the prototype (after the car was stopped).

There is yet another reason why we suggest a cooperative evaluation. When users are performing a think aloud session, they are asked to elaborate on their actions by describing what they believe is happening, why they take an action, or what they are trying to do. Sonopticon, however, was not designed to be interactive. The users have no control over the system output, thus no interaction or dialogue is possible. On the other hand, describing the output of the system (i.e. describing what the user believes is happening) is trivial since all the messages that appear on the windshield are accompanied with text explaining exactly what is going on and thus little confusion is possible. Sound signals can be an exception, but the sound signals that were demonstrated in the prototype are always accompanied by visual information.

Questionnaire:

The questionnaire is a good way of testing the obtrusiveness of the interface and user satisfaction. However, we think that there should have been more than 9 questions to allow the evaluation team to draw more detailed conclusions. We also think that there should have been some open-ended (essay) questions at the end of the questionnaire. Allowing the users to explain exactly what was distracting about the interface can be much more helpful than just asking for an answer from 1 to 5.


There were three design criteria listed by the design team:

  • Robustness
  • Unobtrusiveness
  • User reaction time
Robustness:

This criteria was not addressed well by the evaluation exercises. In general it is hard to define robustness for non-interactive systems. For a system like Sonopticon, for example, one way to measure the robustness is to see whether the system will always give you a blind spot warning. However, this cannot be done with this prototype.

Unobtrusiveness:

This was extensively covered by the questionnaire. In fact, this was the only thing that the questionnaire covered.

Reaction times:

The Sonopticon design team mentions a couple of times that they would like to measure the users' response time to the system output:

"The ability of the user to react properly to these warnings will be observed" "How long does it take for the users to process the information provided?" "The speed and effectiveness of the driver's response to these warnings will be analyzed"

However, none of the evaluation exercises is designed in such a way as to measure these times. Video taping the think aloud session does not directly help find these delays. The data that is collected is purely qualitative. Thus, this design criteria was not properly addressed.

Suggestions:

Observability and detectability should also be included. It is true that in this version of the prototype it is hard to miss any of the visual cues, but the text that accompanies them is written in a very small font, so observability is definitely an issue.

Learnability should also be addressed. Although it might be hard to define and measure learnability for a non-interactive system, it might be worth looking into whether the signals presented to the user make sense. For example, during the Think Aloud the evaluation team found out that the external noise cancelation sign looked very much the same as the mute sign used in most TV sets. Naturally, this led to confusion.

 
 
Appendix top
Heuristic Evaluation Raw Data

Think Aloud Raw Data

Questionnaire Raw Data

 
Last Modified: 12.01.97
Jason Elliott