
Part 3: Evaluation of prototype
Fall Quarter 1997
Evaluation Team: ms-squared
Minaxi Gupta
Siddharth Bajaj
Michael Koetter
Sameer Merchant
Follow-me team
Kevin Scott
Jason Elliott
Alexandre Stoytchev
Rawesak Tanawongsuwan
Results of the Evaluation
The evaluation plan required the evaluators to carry out three evaluation
exercises, viz.
-
Cognitive Walkthrough
-
Think Aloud Exercise
-
Questionnaire
Evaluation Exercise 1 - Cognitive Walkthrough
Description
The cognitive walkthrough provides feedback about the learnability of the
system. It was used to evaluate the touch screen interface of the Follow
Me system. To carry out cognitive walkthrough, the intended user population
and a set of tasks that a user would perform were identified. For each
task, the actions that the user would need to take, to perform that task,
were specified. These are described in the evaluation plan. Next each member
of the evaluation team "walked through" the specified actions and determined
if the interface would allow a user to easily identify and carry out each
action. For each action the evaluation team tried to answer the following
questions about the interface -
-
Did the action have the desired affect.
-
Will the user be able to notice that the correct action is available?
-
Once the correct action was found at the interface, would the user know
that it is right one for the effect he is trying to produce?
-
After the action is taken, would the user understand the feedback he gets?
Once each evaluator had performed the above exercise the findings of the
evaluators were combined and discussed to come up with the final results.
Results
The following describes the results of the cognitive walkthrough exercise.
The results give the desirable features and shortcomings of the interface
that were discovered while performing each action. The results also provide
other flaws that were discovered while performing the walk through, but
not necessarily related to the task on hand. Note that these flaws would
be discovered if tasks in addition to those specified in the plan were
carried out.
Task I - Setting the volume in a room
Action 1 - Click on the Audio button
-
The first screen had two button - "Audio" and "Preferences". The user would
be confused about which button should be pressed to change the audio volume
setting. Thus the user would not be able to decided which interface element
should be used to carry out the desired effect.
-
The buttons should be given a 3-dimensional look to afford pressing. This
is true of all the buttons in the interface.
Action 2 - Click on the Preferences button
-
It was not clear that the user was required to select preferences to adjust
the volume. That button should have been labeled "Room Preferences", so
that the user does not think that it is for setting the system preferences.
-
The volume knob afforded an easy way to adjust the volume. However that
was not the correct action to be taken since that adjusts the volume in
the current room, and cannot be used to adjust the volume in a specific
room.
-
There was no way to go back to the previous screen. A mechanism to do this
should be provided.
-
The volume section of the interface should be specifically labeled to indicate
that those are the settings for the current room.
-
Feedback should be provided about which room the user is in currently.
-
The volume knob should be scaled to indicate what the current volume is.
Action 3 - Click on the Living Room in the floor plan
-
It is not clear if the highlighted room is the current room or if it is
the room whose preferences are the ones displayed on the left.
-
The strip showing the current volume should be scaled.
Action 4 - Turn on Follow Me by clicking on the Follow On radio button
-
The interface provided a clear mechanism to carry out this action and provided
good feedback.
Action 5 - Click Done
-
Selecting Done should also take the user back to the audio control panel.
-
Canceling takes the user back to the main screen. It should take the user
back to the previous screen which is the Audio control panel.
Task II - Play a Song Now
Action 1 -Click on the Audio button
-
The same results as those obtained for action 1 in the previous task were
obtained.
Action 2 - Click on the button to Pick a Song
-
To select a song to be played, the user will be confused between clicking
"Add" or "Pick a new song" button.
Action 3 - Click on the Bookmarks button
-
Bookmarks may not be the right title to be used for the button. The interface
in that case assumes that the user is browser aware.
-
It is not clear what input - song name or artist name - needs to be provided
in the text field corresponding to the song to be played.
Action 4 - Click on the song "Too Much" by Dave Matthews Band
-
This part of the interface was very clean and intuitive. The user would
have no difficulty in deciding the action to take.
-
After the song is selected from the bookmarks list control jumps to the
audio control panel. This is not the expected action. Another problem with
this is that the user does not have a second chance to select the song.
This is annoying in case of a slip on part of the user, in which the user
selects the wrong song. After the user has selected a song from the list
of book marked songs control should go back to the select song dialog box
with the selected song appearing in the text field. The user should then
be required to press the "Play it" button to confirm that he wishes to
play the song.
-
On successful selection, the song starts playing, which is a good form
of feedback
Evaluation Exercise 2 - Think Aloud Description Results
Description
It is a way to gather information by observing users' interaction with
a system. A set of available functionalities/commands are provided to the
subjects. As the subjects work with the system, they are be asked to provide
stream-of-consciousness feedback. In other words, they are asked to talk
out loud about what they are thinking while using the FollowMe system.
The subjects elaborate by describing what they believe is happening with
the system, why they take action, and what they are trying to do. It is
best if their comments not be inhibited in any way and that they be frank
and honest.
The target of this exercise was voice interface only. The evaluation
team members acted as subjects as well. There were two design team members
present to assist with the evaluation process and the exercise was carried
out in wizard of oz mode. The process was be recorded via videotape for
review by the evaluation team.
Results
In the prototype "Follow me", volume control has been implemented in terms
of percentages. This made the volume control commands quite ambiguous.
For example, if the user is in a room and she asks the system (wizard of
oz in this case) to reduce the volume by 50%, the system response becomes
ambiguous because it is not clear whether the system would make it 50%
of the present volume of the system or that of the maximum audible limit
of the system. This is more of problem because there is no feedback available
to the user apart from guessing from the volume of the system after the
command.
One other problem with the volume control was that all three rooms had
different notion of what the maximum or minimum volume was. We understand
that that was due to the difference in the machines, but it should have
been possible to calibrate the machines so that all of them had same notion
of volume.
In our exercise, the wizard of oz was responding to us. But that raises
another issue about the definition of the prototype presented to us vis
a vis the way "think aloud" was carried out. In the prototype, it has not
been mentioned that the system will have any other way of providing feedback
to the user other than the effect perceived let's say via the volume.
There was some confusion between whether the user wanted to have "follow
me" turned off in the room that she is in or whether she wanted to have
the system turned off completely (let's say when she is going to bed).
It was not clear to us whether the voice recognition in the prototype
was well defined or was it loose. In the questionnaire, they ask questions
about user's preferences but during our experience of the system, the wizard
of oz seemed to be deciphering loose voice commands as well.
The touch screen interface allows a user to set preferences to Room
B while in Room A. This functionality was not provided for the voice interface.
We understand that there were time constraints to what all could have been
implemented, but we decided to point it out that similar functionality
would really be a nice feature.
The prototype was demonstrated in CPL lab and although because of space
limitations, the rooms (of the envisioned house) were not clearly demarcated,
the way the voice changed as we moved from one room to another was really
remarkable.
The tracking was also excellent, given the constraints.
Evaluation Exercise 3 - Questionnaire
Description
The questionnaire involves designing a questionnaire with the aim of finding
answers to specific questions. The questionnaire is administered to a set
of test subjects that have either used the system or represent the future
users of the system. This is a very flexible technique, the questionnaire
can be designed to obtain quantitative or qualitative data. It can be used
in formative stages in order to get user view of the proposed system or
it can be used to get summative feedback about the system.
In this case, since we did not have a working model of the system, the
questionnaire will be helpful in getting user opinion on the proposed interface
or the early prototype. This will help in making the system more satisfactory
to the user and provide user-desired accessibility.
Since, the original questionnaire was not designed with this aim, we
have redesigned the questionnaire which is available here.
Results
We redesigned the questionnaire and administered it on 6 of our friends.
Because of the long thanksgiving weekend, we could not find more test subjects.
The results that we got for each question on the questionnaire were the
following (the points are numbered corresponding to the numbers in the
questionnaire):
Question 1 :
-
The test subjects were in the age group 18-25.
Question 2 :
-
4 subjects were male and 2 were female
Question 3 :
-
All the test subjects were Masters/PhD students at Georgia Tech.
Question 4 :
-
Since this system is just a prototype and apart from the design team, we
are the only ones who have used the system, all our test subjects had just
seen a demonstration of the prototype system.
Question 5 :
-
5 of our test subjects said that they preferred "a combination of both
voice based interface and screen interface", only one person preferred
the "voice based" interface.
Question 6 :
-
For feedback, 3 people selected "same mode in which command was executed",
2 selected "voice" interface and 1 selected "screen" interface.
Question 7 :
-
All the subjects said that they would prefer "loose commands".
Question 8 :
-
Everybody said they did not understand the questions.
Question 9 :
-
One person said "yes" and rest said "no" but did not point out any reason.
Question 10 :
-
2 people replied in affirmative, 1 person said that she would prefer to
have the option of having different music in different rooms, one other
person was of the opinion that it should be possible to set preferences
for other rooms as well (using voice interface), 1 person said that the
access to the system and the feedback should be implemented in a way that
involves both the interfaces and the last person did not point out an exact
reason why he was not satisfied. They all did realize that there were limitations
to what was possible to implement.
Question 11 :
Question 12 :
-
2 people pointed out that selecting a song for playing starts playing it
without waiting for the user to "play" it. One other person pointed out
the he got confused with the preferences button on the main screen. Others
did not say anything.
Question 13 :
-
3 people said that the idea was neat, rest did not write anything.
Overall Evaluation and Recommendations
Overall evaluation
System is fairly intuitive
The results of the cognitive walkthrough showed that GUI interface was
fairly intuitive. However, there were a couple of places where the system
operation was unpredictable and non-intuitive. The details for these were
presented earlier in the results for cognitive walkthrough.
System robustness could not be adequately evaluated. Where it was measured
it was fair.
The two sub goals under robustness were-
a) did the system provide proper escapes and could slips be corrected
easily, and
b) Did the system respond appropriately and in time?
The cognitive walkthrough uncovered problems directly related to the first
sub goal with the GUI interface. At places the 'cancel' button took the
user back to the first screen instead the previous screen. Also, the system
did not ask the user for confirmation and hence slips could not be corrected
easily. This design sub goal was not evaluated for the voice interface.
The second sub goal was satisfied for the voice interface but it could
only be measured partly for the touch-screen interface. The response times
were excellent for the voice interface. The response times however, could
not be measured for the touch-screen interface but there was pretty good
feedback.
Accessibility was good
The results of both cognitive walkthrough and think aloud showed that the
available operations were familiar, easily perceived and understood. The
consistency in the functionality of both the interfaces was good, though
there were some discrepancies. The designers could have provided interaction
using both the interfaces instead of making them mutually exclusive. For
example while playing the previews of the songs the list could have been
simultaneously displayed on the touch screen interface. This would result
in better accessibility and satisfaction.
Satisfaction could not be evaluated
The design criteria for satisfaction could not be evaluated adequately
because of the limited functionality of the prototype. However, some measures
for satisfaction could be gained from the prototype. The prototype had
excellent response times for tracking the user movements. Also, the cognitive
walkthrough results show that the interface is fairly intuitive and predictable
and hence would be fairly satisfactory.
Recommendations
-
The touch-screen interface can be made more intuitive and satisfactory
to use by removing the various usability bugs uncovered during cognitive
evaluation.
-
The robustness of the system could be improved by allowing predictable
escape sequences and by asking confirmations from the user, by allowing
the user to easily correct slips.
-
The functionality exported by the two interfaces could be made consistent
by removing discrepancies. One of the discrepancies is that the user could
not set any preferences for another room using the voice interface, though
he could do so using the touch-screen interface.
-
Accessibility of the system could be improved by having the two interfaces
interact rather than being mutually exclusive. We believe that the design
team might not have explored this are due to lack of time.
-
The questionnaire has identified some places where the system functionality
could be enhanced. However, it needs to be administered on a wider scale
to provide more definitive suggestions.
Critique of Original Evaluation Plan
Evaluation tasks well chosen with respect to design criteria
Most of the design criteria were evaluated by one of the three chosen evaluation
methods. Thus, the choice of the three evaluation techniques, viz. think
aloud, cognitive walkthrough, and questionnaire were appropriate.
-
Think aloud was an appropriate technique used in conjunction with the wizard
of oz technique, especially justified by the early stage of the prototype.
-
Cognitive walkthrough was also an appropriate technique because it helped
in exposing many usability bugs related to intuitive operation. It helped
evaluate the prototype for robustness and satisfaction.
-
The questionnaire is also justified in getting preferences of the targeted
user population that would have helped in a more satisfactory interface.
Application of chosen evaluation techniques was inadequate
All the three techniques evaluated the prototype for different set of design
criteria. However, all three techniques were not applied to both the interfaces.
Hence, neither the touch-screen interface nor the voice interface could
be evaluated for all the specified design criteria, as apparent from the
evaluation results in the previous section. This was the area that lacked
most in the evaluation plan.
Questionnaire was poorly designed
The questionnaire was poorly designed, it was more of a post-deployment
marketing questionnaire rather than a questionnaire to evaluate the prototype
against specified design criteria. It also assumed in many places that
the user had extensively used the system. The questionnaire would've been
very valuable to gain valuable input from the targeted used population
in making the system satisfactory to use and provide accessibility desired
by the user. We redesigned the questionnaire and the results for
the new questionnaire were stated above.
Cognitive walkthrough did not have sufficient scenarios
Cognitive walkthrough did not have sufficient tasks to evaluate some important
interface elements. However, this could have been due to time limitations.
Think aloud was poorly specified.
The original evaluation plan specified the task lists similar to cognitive
walkthrough. Instead the commands supported by the voice interface should
have been specified and the test subject should have been given freedom
to play around with the system. This would have led to a more effective
evaluation.