The intent is to keep everything constant except for the use of the experimental material. The post-test results are analyzed to see if there are significant differences between the groups, and the results are interpreted to see how they relate to the hypothesis.
There are variations in the approach described above. A pre-test may
be given before use of the material. Several experimental groups may be
formed, each using different material.
Index
Researchers try to provide a 'natural' environment for the study, i.e.
a setting similar to the environment in which a user would normally use
the material. In some cases this is impractical. A control group is not
required, though comparative groups are frequently used. The observational
method is one of several Naturalistic methods.
Index
The text-only group had 45 minutes to read. The Animation group also had 45 minutes, but had a maximum of 30 minutes to read the material, and could spend the remaining time (15 minutes or more) on the animation. Students using animation and text did marginally better than text-only group, and the animation group finished the post-test in slightly less time.
A survey of the animation group found that students wanted to be able to step back through the animation and replay it. This capability was not available in XTango. They also wanted text descriptions of what was happening. Individual Students commented that they liked the smooth transitions and speed control, but found it difficult to remember the animation afterwards.
The authors note that "Most of the test items require the ability to
accurately carry out the main procedures of the algorithm, and neither
presentation seemed likely to give participants that ability." Given
this limitation, it seems difficult to make firm conclusions from this
study about the effectiveness algorithm animation.
Index
Two alternative lab formats were used in part (B), active and passive. Passive lab participants used data sets prepared by the instructor, while those in active labs constructed their own data sets interactively.
This study implemented some of the suggestions from the first study.
The animation was annotated with a brief text description which in effect
was very high-level pseudocode. At each step of the animation, the relevant
text was highlighted. Use of the animation in the lecture provided instructor
explanations. The design of the animation had more focused instructional
objectives, which were specifically evaluated.
Evaluation and Results
Evaluation of the four experiment combinations (two no-lab, two lab) used fixed response questions to test understanding of specific steps in the algorithm and free response for testing general conceptual understanding.
No significant difference was found between the two no-lab groups in part (A), but animation students did slightly worse on the free response section.
In part (B), students who did the active lab performed significantly better than those who did the passive lab or no lab at all. The difference was larger on the free response section, which the authors suggest indicate an improvement in concept formation.
The second study is one of the most widely referenced algorithm animation
studies, and seems to be a good example of experimental design.
Index
"Much of the empirical work ... suffers from the problem of trying to find global generalisations that are not there.... Many of the studies derive performance results without deriving the information necessary to explain them. Observations are usually not made of how the subjects approached the task and what features they found confusing."
That author combines empirical and observational approaches to design
a study which captures user data in conjunction with empirical measurement.
In addition, the material used was designed to test for understanding of
specific cognitive tasks.
Index
Mulholland compares four Prolog SV systems to see if and how they support the learning of specific tasks. The tasks were chosen based on previous cognitive studies of students learning Prolog.
The subjects were four groups of Cognitive Psychology students taking an Artificial Intelligence module of a course. Each group used a different Prolog SV as a learning and debugging environment for a week.
The students knew Prolog previously, which raises some concern about
possible prior knowledge. No pre-test is mentioned which would help account
for this.
Index
Test Task
The test task was to identify differences between a given program with source code, and a modified program for which only the output was provided. The differences between the programs were of four types:
| Spy | Trace execution using Unification model
|
| PTP | (Prolog Trace Package )
Give more execution details than Spy
|
| TPM | (Transparent Prolog Machine).
Show execution graphically using depth first AND/OR trees Provide overview,
detailed views
|
| TTT | (Textual Tree Tracer)
Trace using close format to source code |
A common execution environment, the Prolog Program Visualization Laboratory
(PPVL) was used for all SV's. It provided a common interface and recorded
user activity.
Index
| Least | PTP |
| SPY/TTT | |
| Most | TPM |
| Most | PTP |
| TTT | |
| Spy | |
| Least | TPM |
Approaches to evaluation are discussed. They find that qualitative evaluation
does not usually address effectiveness directly and tends to rely on student
perceptions which are also mixed with usability issues. Quantitative evaluation
often has no pedagogical substitute for animation in the control group,
so any improvement may be attributed to the extra or alternative methodology
used. They suggest using a human tutor for the control group. They
also mention the need for large number of subjects to produce significant
findings.
Index
The authors believe that improved design will yield better results from
studies. Some of the proffered suggestions have been used in the other
studies discussed, i.e. accompanying text and user interaction in Stasko
and Lawrence, and pedagogical design in Mulholland. Both studies produced
positive results.
Index
| [Mul98] | Paul Mulholland, ìA Principled Approach to the Evaluation of SV: A Case Study in Prologî, in Software Visualization, John Stasko, et.al. eds. MIT Press, 1998. |
| [SL98]
|
John Stasko and Andrea Lawrence, ìEmpirically Assessing Algorithm Animations as Learning Aidsî, in Software Visualization, John Stasko, et.al., eds. MIT Press, 1998. |
| [GC96] | Judith S. Gurka and Wayne Citrin. Testing effectiveness of algorithm animation. Proceedings of the 1996 IEEE Symposium on Visual Languages, pages 182-189, Boulder, CO, September 1996. |