Statistical Measures

Many times the outcome of a situation is difficult, maybe impossible, to predict. All we can say is that there is some range or distribution of possible outcomes. This is true for situations involving only inanimate objects, but is especially true for situations (like those you might find in an HCI environment) involving human beings.

We might hope during these times to see some of the typical behaviors or events possible. This notion lies behind the statistical quantity of the mean.

It would also be nice to know something about the variety of events that are likely to occur. This corresponds to a quantity like variance or else standard deviation.

Another problem is that we can only make a finite number of measurements on the real world, called a sample, as we seek to gain information about it. It's possible that the sample we get may not actually represent what is typical in the real world. We would like to be able to get an idea of what level of confidence can we place on the ability of our measurements to represent what is really going on in the world. When we need to compare two samples or a sample to some required value, we can use the t-test to determine the how significant our sample is. If we have sample data that falls into categories, the chi-square analysis helps us determine the significance.

Two very good discussions of these topics, with different levels of depth of coverage, are listed below in the Links section. There are also other links to statistics-related information in that section. I have also provided a section on the Formulas relating to the above statistical measures.

Links

For a very basic overview of the field of statistics the Introduction To Statistics link from Arizona State is your best bet. A more moderate level discussion is in the UCLA Statistics Textbook.
A Short History of Probability
The field of statistical analysis grew out of probability theory, and here is a brief history of that field. Notice how many famous mathematicians made contributions.

Introduction To Statistics
From a course at Arizona State. Its for Education majors, so you know it can't be too complicated! It gives a brief, clear overview of the basic topics. Its not very good at showing how to make calculations, but it is good at describing the quantities and their properties. Some strong points are:

UCLA Statistics Textbook
The discussion here is at a higher level than at the above site. This seems to be a work in progress (some links point nowhere), but I like the Introduction section, especially the sub-topics on

SurfStat australia
From the Department of Mathematics at The University of Newcastle, Australia. An on-line statistics textbook, at least in theory. Some pages are missing and you get the dreaded 404 HTTP error code (which means it couldn't find the file). There are Java applets dealing with the Normal distribution, linear regression, and discrete probability distributions. There is also a choice of Java applet or text for computing or viewing tables of values for the Normal distribution, t-distribution, or chi-squared distribution.

Interesting Java Applets
A listing from the Institute of Statistics & Decision Sciences at Duke.

Formulas

Our textbook also gives a good description of the basic elements of statistics. It is particularly strong in working out examples to show you how to use the formulas, except for the following error: on the top of page 242, on the first line, it mentions 10 degrees of freedom. In problems where you are comparing two samples with N1 and N2 values, there are N1 + N2 - 2 degrees of freedom, which equals 9 for this problem. They actually used 9 as the number of degrees of freedom when they read from Table 10.1 to get the t-value, so they did the problem correctly, it's just their explanation which is in error.

One thing that I thought would be nice here is a formula list as well as a brief synopsis of the t-test and the chi-square test.

The mean is the average value of a distribution.

The standard deviation of a distribution is a measure of how spread out it is. About two-thirds of a normal distibution (which is what approximates most large populations) lies within one standard deviation of the mean. Here is Figure 10.8 from the text, which shows the percentages of a normally distributed population that lie some number of standard deviations from the mean:

The standard deviation can be calculated by first taking the sum of the squares of the differences between the sample values and the mean, which is labeled SS below. From this the variance is calculated, and then the standard deviation.

t-test

We want to set an upper bound on the probability that the null hypothesis is true. This significance value is represented by the greek letter alpha, and a typical value to shoot for is 0.05. First we must compute a value for t.

Comparing Two Samples

First we calculate the variance of the two samples, from this the standard error of difference, and finally t.

Table 10.1 of the text can be consulted to find the critical value of t given the number of degrees of freedom (N1 + N2 - 2) and a two-tailed significance value. If the computed t lies above this critical value, we can reject the null hypothesis with a level of probability less than the significance value (given by alpha).

Comparing A Sample And Required Value

First we calculate the variance of the sample, from this the standard error of the mean, and finally t. The value R in the calculation for t is the required value to which we are comparing the sample.

We use Table 10.1 of the text again, only this time the one-tailed significance value and N - 1 degrees of freedom.

chi-square test

Given N categories of data, each with observed and expected frequencies foi and fei, we have

We compare the calculated value of chi-squared with values in a table of critical values such as Table 10.2 in the text. There are N - 1 degrees of freedom. If the value in the table is less than our calculated value, we can reject the null hypothesis.

Johnny Nicholas Humphrey III