CLEANROOM TESTING CALCULATIONS I. INTRODUCTION The calculations described below begin with the testing matrix (P). The element in row i, column j denotes the frequency of transition from state i to state j. In state machines of this sort, there are two kinds of states: transient and absorbing. An absorbing state is one that has the property that once you get into it, you are stuck there. These states will correspond to rows of P that are all zeros except for a one on the main diagonal. Note that the corresponding columns are not all zeros. States that are not absorbing are called transient. II. CALCULATION OF THE STATIONARY DISTRIBUTION The stationary distribution (denoted by the Greek letter pi) is a vector of values, one for each state, indicating the probability that, in the long run, the system will be in the corresponding state. That is, pi( i ) is the proportion of time that the machine will be in state i. (Assuming that all states take equally long to execute.) There are two ways to compute pi. The first involves taking P and multiplying it be itself until the values stabilize. The number of iterations will be a tradeoff between convergence due to the law of large numbers and divergence due to loss of precision. A more accurate way is to solve the following matrix equation: pi = pi * P. To solve this, note the following series of identities: pi = pi * P -- To be solved pi * I = pi * P -- I is the identity matrix (pi * I)' = (pi * P)' -- ' denotes matrix transpose I' * pi' = P' * pi' -- The transpose of a product is the product -- of the transposes in the opposite order I * pi' = P' * pi' -- I is its own transpose 0 = P' * pi' - I *pi' -- 0 is the all zero matrix 0 = (P' - I) * pi' -- By the distributive law This results in a homogeneous series of linear equation. To obtain a non-trivial solution to such a series of equations, one of the equations must be replaced by an independent equation. For that equation, we can use the fact that the sum of all pi's elements must be one. That is, the long term probabilities must be exhaustive. So to actually compute pi, the following steps can be taken. 1. Transpose P. 2. Subtract one from each element on the main diagonal. 3. Replace the last row with a row of all ones except. Call the resulting matrix A. 4. Construct a vector b of compatible size containing all zeros except for a one in the last position. 5. Solve the matrix equation Ax = b using your favorite equation solver. 6. x will be pi. III. EXAMPLE OF COMPUTING PI 1. Construct P from state transition probabilities. Each row corresponds to one node. Each column denotes a node to which the first can be reached. The value in the array at position [i, j] is the likelihood that if you are at node i, that you will next be at node j. Let N be the number of nodes. --- --- | 0 .3 .7 0 | | 0 0 0 1 | P = | 0 0 0 1 | N = 4 | 1 0 0 0 | --- --- 2. Transpose P to P'. --- --- | 0 0 0 1 | | .3 0 0 0 | P' = | .7 0 0 0 | N = 4 | 0 1 1 0 | --- --- 3. Subtract the result from the identity matrix. --- --- | -1 0 0 1 | | .3 -1 0 0 | I - P' = | .7 0 -1 0 | N = 4 | 0 1 1 -1 | --- --- 4. These rows are not linearly independent, so one row must be replaced by a row containing all 1's. (This is equivalent to saying that the sum of the pi vector must be 1.0.) We will chose the last row. Let the resulting matrix be called A. --- --- | -1 0 0 1 | | .3 -1 0 0 | A = | .7 0 -1 0 | N = 4 | 1 1 1 1 | --- --- 5. Construct vector b of length N containing all 0's except a 1 in the last position. --- --- | -1 0 0 1 | | .3 -1 0 0 | --- --- A = | .7 0 -1 0 | N = 4 b = | 0 0 0 1 | | 1 1 1 1 | --- --- --- --- 6. Solve Ax = b for vector x of length N. x is the PI matrix. --- --- x = | 1/3 1/10 7/30 1/3 | --- --- IV. MATLAB CODE TO COMPUTE THE STATIONARY DISTRIBUTION % stationary % % This function takes a matrix P as input, which corresponds to the normalized % testing matrix. The output is a vector that represents pi, and is the % stationary distribution for the input Markov chain. function x = stationary( P ) % R is the number of rows/states in the machine. R = size( P, 1 ); % Take the transpose of P. P1 = P'; % Subtract one from each element on the main diagonal. A = P1 - eye( size( P1 ) ); % Replace the last row with a row of all ones. A( R, : ) = ones( 1, R ); % Construct a vector B of compatible size containing all zeros except % for a one in the last position. b = zeros( R, 1 ); b( R, 1 ) = 1; % Solve Ax=b for x. x is pi. x = A \ b; V. CALCULATION OF N, THE EXPECTED NUMBER OF STATES VISITED BEFORE ENTERING A GIVEN STATE As described in the paper, this is just 1 / pi. VI. CALCULATION OF THE EXPECTED NUMBER OF SEQUENCES UNTIL A STATE IS ENTERED As described in the paper, this is just pi( term ) / p( i ), for state i, where term is the index of the termination state. VII. CALCULATION OF RELIABILITY To compute the reliability of a software system, you begin by designing a usage chain and generating random tests. When the tests are run, a testing chain is constructed that includes the states from the usage chain and new states denoting detected failures. The values on the arcs are then normalized (by dividing by the total number of outgoing transitions) so that all arcs are labeled with a number between zero and one. The testing chain is then used to construct a probability matrix, P. From P, the system's actual reliability can be computed. Reliability is computed by solving a system of linear equations: Ax = b where A is a matrix and x and b are vectors. To set up the equations, you must take the following steps. 1. Make all failure states and the system termination states absorbing. This means that the corresponding rows in P must be zeroed out except that a one is placed on the main diagonal. Call the resulting matrix M0. 2. Now remove all rows from M0 that correspond to absorbing states. Call the resulting matrix M1. If M0 has n rows, one of which corresponds to the termination state, and l of which correspond to failing states, then you will be removing l + 1 rows, leaving m (= n - l) rows. M1 will have m rows and n columns. 3. Select the column from M1 that corresponds to the system termination state. Call this column vector b. It will have length m. 4. Now remove all columns from M1 that correspond to absorbing states. This removes l + 1 columns, leaving an m by m array. Call this array M2. It will be of size m by m. 5. Subtract M2 from the identity matrix. This has the effect of negating M2 and then adding 1 to each element on the main diagonal. Call the resulting matrix A. A is still m by m. 6. Solve the matrix equation Ax = b for x. x is a vector of length m. 7. x will have one entry for each non-absorbing state in the testing chain. One of these states corresponds to the system's starting state (also called the initiation state). Look at the element of x that corresponds to this state. If this element is in position i, then x[i] will be the overall system reliability. It should be a number between zero and one. It indicates the probability of starting the system in the initial state and having it end in the termination state. VIII. CALCULATION OF MEAN TIME TO ABSORPTION Calculating MTBF requires first calculating Mean Time to Absorption. To do this, take the following steps. 1. Make all failure states (but not the system termination state) absorbing. This means that the corresponding rows in P must be zeroed out except that a one is placed on the main diagonal. 2. Adjust the system termination state by making the corresponding row in P all zeros except that a one is placed in the column that corresponds to the initiation state of the system. This can be interpreted to indicate that after the system reaches successful termination, it is restarted in the initial state for the next test. Call the resulting matrix M0. 3. Now remove all rows and columns from M0 that correspond to absorbing states. Note that the termination state is not absorbing. Call the resulting matrix M1. If P has n rows, of which l correspond to failing states, then you will be removing l rows and columns, leaving m (= n - l + 1) rows and columns. 4. Construct a vector of length m consisting only of negative ones. Call this column vector b. 5. Subtract 1 from each element on the main diagonal of M1. Call the resulting matrix A. A is still m by m. 6. Solve the matrix equation Ax = b for x. x is a vector of length m. 7. x will have one entry for each non-absorbing state in the testing chain. It can be interpreted as containing values corresponding to the mean time to absorption. That is, if x[3] is 12.02, then the testing machine will on average run 12.02 steps once it enters state three before it fails. IX. CALCULATION OF MEAN TIME BETWEEN FAILURES MTBF is then calculated from equation 6 of the Whittaker and Poore paper. To compute it, take the following steps. 1. Add one to each element of the vector x computed in line 7 of the previous calculation. (Note that x is actually m in equation 6 of the paper.) Call the resulting vector r. r is used in step 4 below. 2. MTBF is computed using a doubly nested loop. The outer loop moves over the set of failing states in P: outer_sum = 0; for (i = f1; i <= fl; i++) { /* f1, f2, ..., fl are failing states */ outer_sum = outer_sum + nu[ i ] * inner_sum; } 3. Nu is computed from the P matrix exactly like pi was computed from the usage matrix in equation 1 of the paper. 4. inner_sum is also computed by a loop. It sums over all usage states. inner_sum = 0; for (j = s1; j <= sn; j++) { /* s1, s2, ..., sn are usage states */ inner_sum = inner_sum + P[ i; j ] * r[ j ]; } 5. The Mean Time to Failure is equal to the final value of outer_sum.