Part 3: Evaluation of prototype

Fall Quarter 1997
 

Evaluation Team: ms-squared

Minaxi Gupta
Siddharth Bajaj
Michael Koetter
Sameer Merchant

Follow-me team

Kevin Scott
Jason Elliott
Alexandre Stoytchev
Rawesak Tanawongsuwan


Results of the Evaluation

The evaluation plan required the evaluators to carry out three evaluation exercises, viz.
  1. Cognitive Walkthrough
  2. Think Aloud Exercise
  3. Questionnaire

Evaluation Exercise 1 - Cognitive Walkthrough

Description

The cognitive walkthrough provides feedback about the learnability of the system. It was used to evaluate the touch screen interface of the Follow Me system. To carry out cognitive walkthrough, the intended user population and a set of tasks that a user would perform were identified. For each task, the actions that the user would need to take, to perform that task, were specified. These are described in the evaluation plan. Next each member of the evaluation team "walked through" the specified actions and determined if the interface would allow a user to easily identify and carry out each action. For each action the evaluation team tried to answer the following questions about the interface -
  1. Did the action have the desired affect.
  2. Will the user be able to notice that the correct action is available?
  3. Once the correct action was found at the interface, would the user know that it is right one for the effect he is trying to produce?
  4. After the action is taken, would the user understand the feedback he gets?
Once each evaluator had performed the above exercise the findings of the evaluators were combined and discussed to come up with the final results.

Results

The following describes the results of the cognitive walkthrough exercise. The results give the desirable features and shortcomings of the interface that were discovered while performing each action. The results also provide other flaws that were discovered while performing the walk through, but not necessarily related to the task on hand. Note that these flaws would be discovered if tasks in addition to those specified in the plan were carried out.

Task I - Setting the volume in a room

Action 1 - Click on the Audio button
  1. The first screen had two button - "Audio" and "Preferences". The user would be confused about which button should be pressed to change the audio volume setting. Thus the user would not be able to decided which interface element should be used to carry out the desired effect.
  2. The buttons should be given a 3-dimensional look to afford pressing. This is true of all the buttons in the interface.
Action 2 - Click on the Preferences button
  1. It was not clear that the user was required to select preferences to adjust the volume. That button should have been labeled "Room Preferences", so that the user does not think that it is for setting the system preferences.
  2. The volume knob afforded an easy way to adjust the volume. However that was not the correct action to be taken since that adjusts the volume in the current room, and cannot be used to adjust the volume in a specific room.
  3. There was no way to go back to the previous screen. A mechanism to do this should be provided.
  4. The volume section of the interface should be specifically labeled to indicate that those are the settings for the current room.
  5. Feedback should be provided about which room the user is in currently.
  6. The volume knob should be scaled to indicate what the current volume is.
Action 3 - Click on the Living Room in the floor plan
  1. It is not clear if the highlighted room is the current room or if it is the room whose preferences are the ones displayed on the left.
  2. The strip showing the current volume should be scaled.
Action 4 - Turn on Follow Me by clicking on the Follow On radio button
  1. The interface provided a clear mechanism to carry out this action and provided good feedback.
Action 5 - Click Done
  1. Selecting Done should also take the user back to the audio control panel.
  2. Canceling takes the user back to the main screen. It should take the user back to the previous screen which is the Audio control panel.

Task II - Play a Song Now

Action 1 -Click on the Audio button
  1. The same results as those obtained for action 1 in the previous task were obtained.
Action 2 - Click on the button to Pick a Song
  1. To select a song to be played, the user will be confused between clicking "Add" or "Pick a new song" button.
Action 3 - Click on the Bookmarks button
  1. Bookmarks may not be the right title to be used for the button. The interface in that case assumes that the user is browser aware.
  2. It is not clear what input - song name or artist name - needs to be provided in the text field corresponding to the song to be played.
Action 4 - Click on the song "Too Much" by Dave Matthews Band
  1. This part of the interface was very clean and intuitive. The user would have no difficulty in deciding the action to take.
  2. After the song is selected from the bookmarks list control jumps to the audio control panel. This is not the expected action. Another problem with this is that the user does not have a second chance to select the song. This is annoying in case of a slip on part of the user, in which the user selects the wrong song. After the user has selected a song from the list of book marked songs control should go back to the select song dialog box with the selected song appearing in the text field. The user should then be required to press the "Play it" button to confirm that he wishes to play the song.
  3. On successful selection, the song starts playing, which is a good form of feedback

Evaluation Exercise 2 - Think Aloud Description Results

Description

It is a way to gather information by observing users' interaction with a system. A set of available functionalities/commands are provided to the subjects. As the subjects work with the system, they are be asked to provide stream-of-consciousness feedback. In other words, they are asked to talk out loud about what they are thinking while using the FollowMe system. The subjects elaborate by describing what they believe is happening with the system, why they take action, and what they are trying to do. It is best if their comments not be inhibited in any way and that they be frank and honest.

The target of this exercise was voice interface only. The evaluation team members acted as subjects as well. There were two design team members present to assist with the evaluation process and the exercise was carried out in wizard of oz mode. The process was be recorded via videotape for review by the evaluation team.

Results

In the prototype "Follow me", volume control has been implemented in terms of percentages. This made the volume control commands quite ambiguous. For example, if the user is in a room and she asks the system (wizard of oz in this case) to reduce the volume by 50%, the system response becomes ambiguous because it is not clear whether the system would make it 50% of the present volume of the system or that of the maximum audible limit of the system. This is more of problem because there is no feedback available to the user apart from guessing from the volume of the system after the command.

One other problem with the volume control was that all three rooms had different notion of what the maximum or minimum volume was. We understand that that was due to the difference in the machines, but it should have been possible to calibrate the machines so that all of them had same notion of volume.

In our exercise, the wizard of oz was responding to us. But that raises another issue about the definition of the prototype presented to us vis a vis the way "think aloud" was carried out. In the prototype, it has not been mentioned that the system will have any other way of providing feedback to the user other than the effect perceived let's say via the volume.

There was some confusion between whether the user wanted to have "follow me" turned off in the room that she is in or whether she wanted to have the system turned off completely (let's say when she is going to bed).

It was not clear to us whether the voice recognition in the prototype was well defined or was it loose. In the questionnaire, they ask questions about user's preferences but during our experience of the system, the wizard of oz seemed to be deciphering loose voice commands as well.

The touch screen interface allows a user to set preferences to Room B while in Room A. This functionality was not provided for the voice interface. We understand that there were time constraints to what all could have been implemented, but we decided to point it out that similar functionality would really be a nice feature.

The prototype was demonstrated in CPL lab and although because of space limitations, the rooms (of the envisioned house) were not clearly demarcated, the way the voice changed as we moved from one room to another was really remarkable.

The tracking was also excellent, given the constraints.

Evaluation Exercise 3 - Questionnaire

Description

The questionnaire involves designing a questionnaire with the aim of finding answers to specific questions. The questionnaire is administered to a set of test subjects that have either used the system or represent the future users of the system. This is a very flexible technique, the questionnaire can be designed to obtain quantitative or qualitative data. It can be used in formative stages in order to get user view of the proposed system or it can be used to get summative feedback about the system.

In this case, since we did not have a working model of the system, the questionnaire will be helpful in getting user opinion on the proposed interface or the early prototype. This will help in making the system more satisfactory to the user and provide user-desired accessibility.

Since, the original questionnaire was not designed with this aim, we have redesigned the questionnaire which is available  here.

Results

We redesigned the questionnaire and administered it on 6 of our friends. Because of the long thanksgiving weekend, we could not find more test subjects. The results that we got for each question on the questionnaire were the following (the points are numbered corresponding to the numbers in the questionnaire):
Question 1 : Question 2 : Question 3 : Question 4 : Question 5 : Question 6 : Question 7 : Question 8 : Question 9 : Question 10 : Question 11 : Question 12 : Question 13 :  

Overall Evaluation and Recommendations

Overall evaluation

System is fairly intuitive

The results of the cognitive walkthrough showed that GUI interface was fairly intuitive. However, there were a couple of places where the system operation was unpredictable and non-intuitive. The details for these were presented earlier in the results for cognitive walkthrough.

System robustness could not be adequately evaluated. Where it was measured it was fair.

The two sub goals under robustness were- The cognitive walkthrough uncovered problems directly related to the first sub goal with the GUI interface. At places the 'cancel' button took the user back to the first screen instead the previous screen. Also, the system did not ask the user for confirmation and hence slips could not be corrected easily. This design sub goal was not evaluated for the voice interface. The second sub goal was satisfied for the voice interface but it could only be measured partly for the touch-screen interface. The response times were excellent for the voice interface. The response times however, could not be measured for the touch-screen interface but there was pretty good feedback.

Accessibility was good

The results of both cognitive walkthrough and think aloud showed that the available operations were familiar, easily perceived and understood. The consistency in the functionality of both the interfaces was good, though there were some discrepancies. The designers could have provided interaction using both the interfaces instead of making them mutually exclusive. For example while playing the previews of the songs the list could have been simultaneously displayed on the touch screen interface. This would result in better accessibility and satisfaction.

Satisfaction could not be evaluated

The design criteria for satisfaction could not be evaluated adequately because of the limited functionality of the prototype. However, some measures for satisfaction could be gained from the prototype. The prototype had excellent response times for tracking the user movements. Also, the cognitive walkthrough results show that the interface is fairly intuitive and predictable and hence would be fairly satisfactory.
 

Recommendations


Critique of Original Evaluation Plan

Evaluation tasks well chosen with respect to design criteria

Most of the design criteria were evaluated by one of the three chosen evaluation methods. Thus, the choice of the three evaluation techniques, viz. think aloud, cognitive walkthrough, and questionnaire were appropriate.

Application of chosen evaluation techniques was inadequate

All the three techniques evaluated the prototype for different set of design criteria. However, all three techniques were not applied to both the interfaces. Hence, neither the touch-screen interface nor the voice interface could be evaluated for all the specified design criteria, as apparent from the evaluation results in the previous section. This was the area that lacked most in the evaluation plan.

Questionnaire was poorly designed

The questionnaire was poorly designed, it was more of a post-deployment marketing questionnaire rather than a questionnaire to evaluate the prototype against specified design criteria. It also assumed in many places that the user had extensively used the system. The questionnaire would've been very valuable to gain valuable input from the targeted used population in making the system satisfactory to use and provide accessibility desired by the user. We  redesigned the questionnaire and the results for the new questionnaire were stated above.

Cognitive walkthrough did not have sufficient scenarios

Cognitive walkthrough did not have sufficient tasks to evaluate some important interface elements. However, this could have been due to time limitations.

Think aloud was poorly specified.

The original evaluation plan specified the task lists similar to cognitive walkthrough. Instead the commands supported by the voice interface should have been specified and the test subject should have been given freedom to play around with the system. This would have led to a more effective evaluation.