Quantitative and Qualitative Modeling and Evaluation

Introduction

Two activities go hand-in-hand in a majority of HCI research: modeling and evaluation.  Modeling addresses what you know about the user, and often their surrounding social and physical environment.  A variety of existing models, such as the Human-Model Processor, and modeling techniques, such as Contextual Inquiry, address differing domains and levels of specificity.  Models may be used to predict performance, organize field data, and describe potential interactions with a computer interface.  As you read, examine the various models and modeling techniques that provide the foundation for the research.  When will these models be useful in other research settings?  What do you need to know to complete a model?  How can you gather that information?

One use of models is to inform the evaluation of an interface.  These activities are linked as the specificty and domain of the models constrains the questions that can be addressed in an evaluation.  You will notice that specific, quantitative models are used to inform specific, quantitative evaluations.  Likewise, more general, qualitative models are often the basis for various qualitative studies.  The feasibility of combining various evaluation techniques is influenced by the compatability of the underlying models.  If the models make conflicting assumptions about the user, perhaps even disagreeing on what can or cannot be known, then the validity of combining evaluation techniques is in question.

One of the distinguishing characteristics of the HCI area in Computer Science is the importance of evaluation of how any computer-assisted system impacts its intended user population. Evaluation in HCI (and other human-centered disciplines) is quite different from evaluation in other areas of Computer Science, mainly because it is sometimes hard to construct experiments or observations that give definitive quantitative answers regarding the merit of one system over another. Instead, evaluation in HCI consists of demonstrating a scientific approach to answer questions about a systems relative merit in its context of use. This approach can consist of a myriad of techniques. Sometimes, a very reliable quantitative result is derivable, as is the case in narrowly-focussed human motor observations such as a Fitts' Law experiment or a Keystroke-Level Model analysis. Other times, when the impact on work practices is sought, it is nearly impossible to control all influences in a natural setting. A student of HCI should become familiar with the variety of evaluation techniques and develop a sense of suitability of these techniques.

One of the best ways to achieve the ability to critique evaluation approaches is to read examples of evaluation work in the literature. As you read, critique the research based on the repeatability of the experimentation (Could a competent researcher reproduce the findings following the procedures described by the authors?) and the strength of the analysis and conclusions (Did the authors do enough to convince you of their evaluation results?).

One way to organize the information that you gather is to fill in the simple, 2x2 matrix:

Modeling Evaluation
Quantitative    
Qualitative  

You should pay attention to the horizontal connections between modeling and evaluation techniques.  Likewise, notice the connections, and disconnections between quanitative qualitative techniques.
 

General Resources

Surveys and detailed coverage of many modelling and evaluation techniques are covered in CS 6750: Introduction to HCI as well as the follow-on course, CS 6455: User Interface Design and Evaluation.

Many papers in the SIGCH conference series on Human Factors in Computing Systems, also known as the CHI conference, include significant modelling and evaluation work, both quantitative and qualitative. This is also true of the CSCW conference series (both the ACM CSCW conference and the European ECSCW series), though CSCW research tends to include more qualitative modelling and evaluation. The ACM UIST conference usually does not emphasize modelling and evaluation as much, but there are occasional stellar papers that provide a judicious balance between technology development and evaluation.

Modeling

Fitts' Law, Model-Human Processor and GOMS

Many quantitative models arise from the Human Factors literature.  Some models are best suited for describing expert (decision-free), simple motor and cognitive activities.  The most well-known examples are Fitts' Law and GOMS. There have been numerous examinations of Fitts' Law in the context of graphical user interface design. Bill Buxton has published several papers on applications and extensions of Fitts' Law. A good example is: GOMS is based on a well-known model of human cognition and behavior, the Model-Human Processor.  The opening chapter describes this model as well as the Keystroke Level Model, first defined by Card, Moran and Newell: This work is the foundation for the GOMS family of evaluation techniques. GOMS has been one of the few widely known theoretical concepts in human-computer interaction. Two recent and good survey articles on the history and applications of GOMS are:

Other Theories of Human Cognition

Much of HCI has been influenced by the Model-Human Processor model. However, as HCI moves into new domains, other models of human cognition are relevant. The three major theories listed below (situated action, activity theory and distributed cognition) examine the relationship between information in the head, such as a plan, and information in the world, such as a written to-do list.   These theories may be the basis for both qualitative and quantitative models and evaluation techniques.  As an HCI graduate student, you should have a general understanding of these theories, and when they may be useful guides.

Interaction Models

Up to this point, these models of human cognition and behavior have not explicitly incorporated computer interfaces. The following two papers present interaction models that describe how a person interacts with a computational interface. These models can be used to compare different interface designs, such as direct manipulation, speech, gesture and tangible interfaces.  See:

Contextual Inquiry and Design

Contextual Inquiry is a set of methods for gathering qualitative information and human activity in a complex, social setting.  A variety of models are used to represent these multi-variate environments.  Contextual Design is a methodology for using these models to inform an interface design.

Gathering Qualitative Data

A common method for gathering qualitative data is interviewing.  This short book is an indispensible guide: Many researchers now promote using ethnographic techniques to gather data about complex, social settings.  As an example, see: For more information about ethnographic investigations, students may want to consult:

Evaluation

Quantitative vs. Qualitative

The most basic distinction is between a quantitative or qualitative evaluation. In a quantitative evaluation, the purpose is to come up with some objective metric of human performance that can be used to compare interaction phenomena. This can be contrasted with a qualitative evaluation, in which the purpose is to derive deeper understanding of the human interaction experience. A typical example of a quantitative evaluation is the empirical user study, a controlled experiment in which some hypothesis about interaction is tested through direct measurement. A typical example of a qualitative evaluation is an open-ended interview with relevant users.

Evaluation Techniques

There are a number of established evaluation techniques that are useful in different situations.  Students should be familar with a number of techniques that are discussed in the HCI I (6750) course.   When reading about these techniques, focus on understanding when a technique is valid, and the underlying model of human behavior.  Some additional resources are included.

Cognitive Walkthrough

Laboratory Evaluation

This topic is covered briefly in CS 6750 and in more depth in CS 6455. Some additional references are books on experimental design and hypothesis testing:

Think-Aloud Method

Usability Engineering and Heuristic Evaluation

Surveys, Questionnaires and Interviews

Field Observation

Summative vs. Formative

An important question to ask when performing evaluation is when to perform the evaluation with respect to the overall life cycle of a system. Formative evaluation occurs prior to much investment in implementation of a design, whereas summative evaluation occurs after a full system has been deployed. Many evaluation techniques can be employed in either a formative or summative mode, but it is important to know what the difference is when applied before or after an artifact has been implemented. You must also take into account the co-evolutionary influence of human tasks and interaction technology.

What is enough evaluation?

It is also important to understand that within the HCI research community, there are different expectations for evaluation. We should not expect the same amount of evaluation efforts in a paper that talks about a toolkit supporting multimodal gesture recognition as we would in a paper concerned with the impact of some existing technology in a domestic environment. When reading a research paper in the HCI area, you need to determine what the appropriate expectations should be for user-centered evaluation and judge accordingly. Remember that all systems have users (a programmer uses a toolkit) and proper consideration for the needs of that user should always be apparent in HCI research. 
Gregory Abowd
Elizabeth Mynatt
Last modified: Thu Aug 24 00:49:25 EDT 2000