There are basically four ways to evaluate an user interface : Formally by some analysis technique, automatically by a computerized procedure, emperically by experiments with test users, and heuristically by simply looking at the interface and passing judgement according to ones own opinion. Formal analysis models have not yet reached the stage where they can be generally applied in real software development projects. Automatic evaluation is completely infeasible except for a few primitive checks. Therefore, current practice is to do emperical evaluation if one wants a good and thorough evaluation of a user interface. Unfortunately, in most practical situations, people actually do not conduct empirical evaluations because they lack the time, expertise, inclination, or simply the tradition to do so.
As its name suggests, heuristic evaluation is not a pure analysis technique. It might be thought of as "analysis by a team of analysists using a variety of informed models". A more descriptive term applied to heuristic evaluation is Usability Inspection : an inspection is carried out and a list of problems affecting usability is drawn up. For its effect, the heuristic evaluation method relies on two techniques in combination. First, it employs a team of evaluators rather than relying on one person to carry out the evaluation. Second, a set of design heuristics is used to guide the evaluation.
Molich and Nielsen [1990a] have listed nine heuristics which can be used to generate ideas while critiquing the system. These nine principles seem to be well suited as the basis for practical heuristic evaluation. These nine principles correspond more or less to principles that are generally recognized in the user interface community and almost all usability problems fit well into one of these categories.
In order to test the practical applicability of heuristic evaluation, Nielsen and Molich [1990b], conducted four experiments where a number of evaluators were presented with an interface design and asked to comment on it. They found that individual evaluators were mostly bad at doing heuristic evaluations and that they only found between 20% and 51% of the usability problems in the interfaces they evaluated. On the other hand, they found that the overall result can can be dramatically improved by forming aggregates of evaluators since the "collected wisdom" of several evaluators is not just equal to that of the best evaluator in the group. Aggregates of evaluators are formed by having several evaluators conduct a heuristic evaluation and then collecting the usability problems found by each of them to form a larger set.
They also found that three to five evaluators in an aggregate would be
able to detect about two third of the usability problems. The result
can be seen in figure 1.
They also found that in order to produce better results than the individual evaluatons, the evaluators should do their evaluations independently of each other and only compare results after each of them has looked at the design and written his/her evaluation report. This is so that the evaluators don't bias each other towards a certain way of approaching the analysis and therefore only discover certain usability problems.
In another study carried out by Nielsen[1992], he found out that usability specialists
were much better than those without usability expertise at finding
usablilty problems by heurisitic evaluation. Furthermore, usability
specialists with expertise in the specific kind of interface being
evaluated did much better than regular usability specialists without
such expertise, especially with regard to certain usability
problems that were unique to that kind of interface.
In figure 2, `Novice evaluators' refer to evaluators with no usability expertise, "regular specialists" refer to usability specialists, and "double specialists" refer to usability specialists who also have experience with the particular kind of interface being evaluated. As can be seen from figure 2, the double specialists found significantly more usability problems than did the regular usability specialists. If double specialists are used a smaller group size can be used to evaluate the interface. A size of two to three has been recommended.
He also found that major usability problems have higher probability of being found than minor problems by heuristic evaluation. Problems with the lack of clearly marked exits are harder to find than problems violating the other heuristics, and additional efforts should be taken to identify these usability problems.
A number of advantages are claimed for heuristic evaluation. They include
A disadvantage of the method is that it sometimes identifies usability
problems without providing direct suggestions to solve them The method is
biased by the current mindset of the evaluators and
normally does not generate breakthroughs in the evaluated design.