Explanation
Note: Several options were removed on May 5, 1997 due to space constraints.
The Quick Summaries provide a view of only HTML files, sorted by all
requests. The external and internal request tabulations are included
to provide deeper insight into who's viewing what. These summaries
span a predefined set of ranges, and are updated daily.
Next, the Graphical Summaries area enables one to select a few
time periods and graphing criteria for all accesses to our Website.
The different criteria consist of "Requests", which plots the total
number of requests, "% Change" which graphs the daily change in use, and
"Weighted", which graphs the daily weighted change as (frequency+1) *
z-normal(daily change). All graphs are split by internal and external
accesses and are computed on a daily basis.
Finally, the Extended Summaries table provided that means for one to
look at various ranges of time and have the output sorted by various
criteria. Since all requests closely correlates to external requests,
this sorting criteria is not offered. Additional control is provided by
allowing for only the "Top 1000" files to be displayed as well as all
requested files for the query. These files are recomputed daily.
Since there are multiple fields, it may be best to reduce the size of
your fixed font (check your 'Options' menu) to 8 point or so.
Details
On a nightly basis, the original HTTPd access logfiles are summarized into
the number of external and internal accesses for successfully returned requests,
i.e., status < 400. Daily changes and weighted changes are then computed,
where weighted change is of the form: (frequency+1) * z-normal(daily change).
The filenames have been canonicalized as per our current HTTPd server
conventions, so, *directory/ gets expanded to *directory/index.html
and the symbolic link followed before the access is attributed to
a particular file. Thus, a request for http://www.cc.gatech.edu is counted
as CoC.html. The filesystem is used to follow both symbolic links and paths.
If this fails, an in memory path resolution method
is used. Since canonicalization is an expensive process (daily checking of 30,000+
requested URLs), a cache is kept that maps requested files to their canonical
counterparts. This cache is removed every month and rebuilt. This only causes
problems for symbolic links that are redirected during this time frame. If this
really upsets you, let me know and I'll fix it, otherwise, feel free to
view this as noise. The graphs are created using gnuplot and pbmplus for
conversion into gif.
You will no doubt notice a hole in the dataset occurring from November 18, 1995
through December 5, 1995. This resulted from a glitch in the routine rotation
of the access logs. The data is gone, so you might as well get over it.
Customization
This service still may not answer your needs regarding use of Web resources.
Below are a few possible solutions to your problem, though access to which machines
made which requests or summary statistics of this sort are not provided.
- If you'd like customized graphs of the accessed to either your set of
pages or a specific page, please go to /net/www/db1/usage_stats via our local filesystem
and read the documentation therein. What you will find it that there is a program called
'graph.pl' which generates graphs for any regular expression over a specified
time range.
- Access to the daily summary statistics are also provided from
the same directory suite. Thus, those wishing to develop specialized analyzes
or other interesting derivations, can access the summary datafiles and go to town.
Let us know if you come up with anything useful.
Closing
Thanks to CNS and especially Bryan Rank for providing the resources for this service.
If you have some comments about this service, are interested in the code, or would like
to extend it, please feel free to contact me at
pitkow@cc.gatech.edu.