In general the evaluation plan was well defined. However there are some
things that we would like to mention.
Heuristic Evaluation:
The heuristic evaluation was very clearly defined. We had no problems
doing this part of the evaluation.
The web-videos were very well done and certainly helped the evaluation
process a lot. As a result we came up with a bunch of usability problems
that could have been difficult to find otherwise.
Think Aloud:
The 'think aloud' did not go as smoothly as intended because there were
some limiting factors. The test subjects were very isolated from the
outside world because of the HUD and the earplug speakers. To further
complicate things, the "radio" was on all of the time and it was very
hard for the evaluators to speak when they could hardly hear their voice.
The design team was sitting in the back seat and it was hard for them to
hear what the evaluators were saying as well.
Under these conditions, doing a 'think aloud' did not make much sense.
We think that a 'cooperative evaluation' would be the right evaluation
technique to use in this case. The difference between the two is minor,
but in cooperative evaluations, users are asked to critique the
system. In essence, this is what some of the evaluators did after
they tested the prototype (after the car was stopped).
There is yet another reason why we suggest a cooperative evaluation.
When users are performing a think aloud session, they are asked to elaborate on
their actions by describing what they believe is happening, why they take an
action, or what they are trying to do. Sonopticon, however, was not designed
to be interactive. The users have no control over the system output, thus no
interaction or dialogue is possible. On the other hand,
describing the output of the system (i.e. describing what the user
believes is happening) is trivial since all the messages that appear
on the windshield are accompanied with text explaining exactly what
is going on and thus little confusion is possible. Sound signals can
be an exception, but the sound signals that were demonstrated in the
prototype are always accompanied by visual information.
Questionnaire:
The questionnaire is a good way of testing the obtrusiveness of the interface
and user satisfaction.
However, we think that there should have been more than 9 questions to
allow the evaluation team to draw more detailed conclusions. We also think
that there should have been some open-ended (essay) questions at the end of
the questionnaire. Allowing the users to explain exactly what was
distracting about the interface can be much more helpful than just
asking for an answer from 1 to 5.
There were three design criteria listed by the design team:
- Robustness
- Unobtrusiveness
- User reaction time
Robustness:
This criteria was not addressed well by the evaluation exercises.
In general it is hard to define robustness for non-interactive systems.
For a system like Sonopticon, for example, one way to measure the robustness
is to see whether the system will always give you a blind spot warning.
However, this cannot be done with this prototype.
Unobtrusiveness:
This was extensively covered by the questionnaire. In fact, this was the only
thing that the questionnaire covered.
Reaction times:
The Sonopticon design team mentions a couple of times that they would like
to measure the users' response time to the system output:
"The ability of the user to react properly to these warnings will be observed"
"How long does it take for the users to process the information provided?"
"The speed and effectiveness of the driver's response to these warnings will be analyzed"
However, none of the evaluation exercises is designed in such a way as to
measure these times. Video taping the think aloud session does not directly
help find these delays. The data that is collected is purely qualitative.
Thus, this design criteria was not properly addressed.
Suggestions:
Observability and detectability should also be included. It is true
that in this version of the prototype it is hard to miss any of the visual
cues, but the text that accompanies them is written in a very small font, so
observability is definitely an issue.
Learnability should also be addressed. Although it might be hard to define
and measure learnability for a non-interactive system, it might be worth
looking into whether the signals presented to the user make sense.
For example, during the Think Aloud the evaluation team found out that
the external noise cancelation sign looked very much the same as the
mute sign used in most TV sets. Naturally, this led to confusion.