Visualize SkyServer Web Log with VISTA (2)

      DISL Group at Georgia Tech and Jim Gray at Microsoft Research

 

We have introduced the operations of alpha value changing, zooming, subset selecting and cluster marking in VISTA tool. We will introduce some new operations, including

With these operations, we will see how to analyze the log of ¡°500¡± errors in detail and how to observe client access patterns. We show the two demo examples step by step, and explain the rationale behind the operations.

Because we will move axis in this demo, to identify the moved axes, we need to include the axis names into the dataset (download the new dataset). An axis name can be observed by moving the mouse pointer to the corresponding blue alpha widget and the axis name will be shown on the bottom (the detail-on-demand box) of the screen.

You may not be very clear with previous demos how the points are arranged along the axes. To understand the demos better, we would like to briefly discuss the mapping mechanism. A point position is determined by the values of all dimensions of the corresponding row with Gama-mapping. We will not explain Gama-mapping here. Basically, values in dataset are normalized to [0, 1] before creating visualization with max-min normalization. For numeric data, the normalization is directly applied. For categorical data, n categories of a column are mapped to n numbers (0, 1, 2, ¡­n-1) and then scale the n numbers to [0, 1]. This normalization results in the continuous (for numeric data) or equally distanced (for categorical date) distribution along certain axis. For example, setting alpha value of ¡°ClientIP¡± to 0, and observing the point distribution along the ¡°Error Code¡± axis, you can see there are five ¡°lines¡±, which corresponds to five different error codes that are regarded as numeric data. To observe the normalization of categorical data, for example, the ¡°ClientIP¡± column, we set alpha value of ¡°ErrorCode¡± to 0 and see there are two categories along the ¡°ClientIP¡± axis. In non-perpendicular axes, for example, 3 or 5 dimensions, the discrimination along one axis may not be so clear, because the distribution is affected by other dimensions. However, by dynamically changing alpha value of the axis, we can still see the groups that having different values in this dimension, especially for categorical columns.

1. Analyze the log of ¡°500¡± error.

In this example, we will focus on the commands that cause the error code ¡°500¡±.

1)     To find the commands that cause the error code ¡°500¡±, first, set alpha of ¡°ClientIP¡± to 0. Move mouse pointer to the blue alpha widget of ¡°ClientIP¡±. The bottom line shows ¡°ClientIP¡±. (Figure 1)

Figure 1.       

2)     Switch the operation to ¡°select subset¡± (choose the option on right option panel). Select the group that has error code ¡°500¡± by freehand drawing to draw the boundary to enclose it. Then click on ¡°select¡± button to show only this group. (Figure 2)

Figure 2a

 

Figure 2b

3)     Zoom in the visualization. We find the group leaves the center of visualization. (Figure 3)

Figure 3.

4)     To focus on the center of the group, we shift the focus of visualization. To do this, hold ¡°shift¡± key and click mouse on the center you want to focus. (Figure 4)

 

Figure 4.

5)     By zooming and adjusting focus, we get a visualization that is specific enough to show the detail. Switch the operation to ¡°change alpha¡±. Set the alpha of ¡°seq¡± to 0 and maximize the alpha of ¡°command¡± to observe which commands cause the ¡°500¡± error. You can see 7 commands cause the error. (Figure 5)

Figure 5.

 

6)     Increase alpha of ¡°ClientIP¡± to separate the group to find the commands issued by different clients. You can see client1 issued two kinds of commands and client2 issued 5. (Figure 6) 

Figure 6.

7)     Switch the operation to ¡°select subset¡±. Enclose the subset that belongs to client1 by freehand drawing and click on ¡°mark¡± button to mark the cluster. Similarly, mark the other cluster. (Figure 7)

 

Figure 7.

8)     Observe the detail of cluster by clicking on the button ¡°Summ¡±. You will see a dialog that shows some statistical properties of marked clusters, including the percentage of the points of a cluster to the entire dataset.  Some colored buttons on the top of the dialog are ¡°cluster detail¡± buttons. (Figure 8)

Figure 8.

9)     Click on one ¡°cluster detail¡± button. Another dialog shows the rows in the corresponding cluster. (Figure9)

Figure 9.

2. Observing client access pattern

In this demo, we will show how to observe the client access pattern in terms of ¡°seq¡± and ¡°command¡±. From the access pattern we can know the sequence of user actions. We can also compare two client access patterns by comparing the visualization in the gross.

Observing one pattern

1)     After loading data, set alpha of ¡°ErrorCode¡± to 0. (Figure 10)

Figure 10.

2)     Switch operation to ¡°move axis¡±. Move the ¡°ErrorCode¡± axis to aside and move the ¡°seq¡± axis to the original position of ¡°ErrorCode¡± axis. (Figure 11)

Figure 11.

3)     Switch the operation to ¡°change alpha¡±. Increase alpha of ¡°ClientIP¡± to separate the two clients. We want to observe the access pattern of client1, so switch the operation to ¡°subset selection¡±, select the subset by freehand drawing and click on ¡°select¡±. (Figure 12)

Figure 12.

 

4)     Maximize the alphas of ¡°command¡± and ¡°seq¡±. Zoom and adjust the focus to get an enlarged clear visualization.

 

Figure 13.

5)     Along ¡°seq¡± axis, we find 3 command sequences: seq number 492-535, 563-611, and 643-668. Mark them respectively. (Figure 14).

6)     Observe the detail of each cluster by clicking on ¡°summ¡± button.

Figure 14.

 

 

Compare two patterns

To compare the similar access patterns on visualization, we reproduced another two clients' access log, which have the same access patterns to the original two clients. (Download dataset) The initial visualization is figure 15.

Figure 15.

 

1)     Reduce the alpha of ¡°ErrorCode¡± and change alpha of ¡°ClientIP¡±. We observe 4 groups along ¡°ClientIP¡± axis. Detail information shows they belong to 4 different clients. (Figure 16)

Figure 16.

2)     Switch the operation to ¡°select subset¡±. Select two similar groups for further observation. (Figure 17)

Figure 17a

Figure 17b.

3)     Similarly, set alpha of ¡°ErrorCode¡± to 0 and move the axis of ¡°seq¡± to the original position of ¡°ErrorCode¡± axis. (Figure 18).

Figure 18.

4)     Maximize the alphas of ¡°seq¡± and ¡°command¡±. Zoom and adjust focus of visualization to get a detailed visualization. Adjust alpha of ¡°ClientIP¡± so that two patterns are placed appropriately. (Figure 19)

Figure 19.