Quick Summaries Graphical Summaries
Past number of days Which factor to graph
Extended Summaries
Past number of days Primary sorting field Length of output

Explanation

Note: Several options were removed on May 5, 1997 due to space constraints.
The Quick Summaries provide a view of only HTML files, sorted by all requests. The external and internal request tabulations are included to provide deeper insight into who's viewing what. These summaries span a predefined set of ranges, and are updated daily.

Next, the Graphical Summaries area enables one to select a few time periods and graphing criteria for all accesses to our Website. The different criteria consist of "Requests", which plots the total number of requests, "% Change" which graphs the daily change in use, and "Weighted", which graphs the daily weighted change as (frequency+1) * z-normal(daily change). All graphs are split by internal and external accesses and are computed on a daily basis.

Finally, the Extended Summaries table provided that means for one to look at various ranges of time and have the output sorted by various criteria. Since all requests closely correlates to external requests, this sorting criteria is not offered. Additional control is provided by allowing for only the "Top 1000" files to be displayed as well as all requested files for the query. These files are recomputed daily. Since there are multiple fields, it may be best to reduce the size of your fixed font (check your 'Options' menu) to 8 point or so.

Details

On a nightly basis, the original HTTPd access logfiles are summarized into the number of external and internal accesses for successfully returned requests, i.e., status < 400. Daily changes and weighted changes are then computed, where weighted change is of the form: (frequency+1) * z-normal(daily change). The filenames have been canonicalized as per our current HTTPd server conventions, so, *directory/ gets expanded to *directory/index.html and the symbolic link followed before the access is attributed to a particular file. Thus, a request for http://www.cc.gatech.edu is counted as CoC.html. The filesystem is used to follow both symbolic links and paths. If this fails, an in memory path resolution method is used. Since canonicalization is an expensive process (daily checking of 30,000+ requested URLs), a cache is kept that maps requested files to their canonical counterparts. This cache is removed every month and rebuilt. This only causes problems for symbolic links that are redirected during this time frame. If this really upsets you, let me know and I'll fix it, otherwise, feel free to view this as noise. The graphs are created using gnuplot and pbmplus for conversion into gif.

You will no doubt notice a hole in the dataset occurring from November 18, 1995 through December 5, 1995. This resulted from a glitch in the routine rotation of the access logs. The data is gone, so you might as well get over it.

Customization

This service still may not answer your needs regarding use of Web resources. Below are a few possible solutions to your problem, though access to which machines made which requests or summary statistics of this sort are not provided.

Closing

Thanks to CNS and especially Bryan Rank for providing the resources for this service. If you have some comments about this service, are interested in the code, or would like to extend it, please feel free to contact me at pitkow@cc.gatech.edu.