Caches help with WWW requests just as they do with memory: whenever a WWW browser requests a web page, it can look first in the cache before retrieving the page from the Internet at large. Some caches are better than others; a poor cache will rarely have the requested page in it, while a good cache will frequently have the requested page.
In this project, you will implement different replacement policies in the Squid caching HTTP proxy, and then you will measure how well the different policies work on your own web browsing load.
The goals of this project are:
You may work in groups of two.
Submit the following items:
squid
-z.
NOTE: There will be about 5 points allocated for extra credit. Don't be sad if you get a 95, but if you want a full 100, you might do things like:
Download Squid 2.5.STABLE5 from http://www.squid-cache.org. Extract it somewhere, and then make a separate copy of the directory for you to work in. (You will need the original version later, in order to produce diff's when you submit your work.)
Compile and install Squid with the following sequence of commands:
./configure --prefix ~/squid
To later get rid of Squid, either do make uninstall, or
simply rm -rf ~/squid. Feel free to specify some other
directory than ~/squid; the rest of this page will
be written as if you have installed Squid into ~/squid,
but you can simply adjust the instructions.
You will need to configure Squid a little.
Edit ~/squid/etc/squid.conf and make the following changes:
cache_dir ufs /home/lex/squid/var/cache 10 16 256
Replace /home/lex/squid with the directory where you
installed Squid. Note that a 10MB cache is specified here. This is
smaller than the default, both because you are the only user of this
cache, and because you want to see some significant cache contention
during this project.
http_access allow localhost" (without
the quotes) somewhere.
http_access allow our_networks. If this still does not
work then please contact the TA and we will sort out what is
necessary.
A cache of any kind must have a replacement strategy. This is the strategy that is used whenever a new item is added to the cache, to decide which old item to remove. That is, the new item replaces some other object in the cache. Some common strategies are:
For this project, you are to implement three policies:
You do not need to be mathematically exact in the last model; so long as some preference is given to less-recently-used items, that is fine.
Do not panic over the number of strategies! Once you get one of them to work, the others should be variations.
To gain an overall understanding of the Squid code structure, first read the Squid Programmers Guide. Squid uses its own, customized file systems for performance purposes. The source code for each type of storage is in "src/fs/". You need to understand the following data structures and functions to begin work.
Feel free to implement your alternative replacement policies by modifying the code in src/repl/lru. Be sure that you have enough ifdef's, however, that Squid can be compiled with each of the three replacement strategies that you need to support. For extra credit, implement your replacement strategies so that they may be selected in squid.conf without needing a recompile.
To record a web load, you should start up Squid and then configure
your browser to use it. To do this, go to the "proxy" settings of
your web browser and, for HTTP, tell it to use host "localhost" and
port "3128". If your tool wants a URL for the proxy, then specify
http://localhost:3128.
Additionally, you should look through your web browser's settings and disable its own disk cache and give it a small memory cache. That way, the browser will hit on your Squid server hard.
Once you are set up like this, then all of your web browsing should go through Squid. If you "du -s ~/squid/var/cache" then you will see the cache growing larger over time as you request more web pages. If you "ls -l ~/squid/var/logs" then you will see the log files growing.
After a day or so, you should have accumulated a thousand or more hits. (If not, then do some random web browsing until you do reach at least a few thousand hits!) Look through ~/squid/var/logs/access.log to see the URL's you have requested.
Part of this project is to convert the list of requests in access.log into a script that will repeat that series of requests. You may limit yourself to GET requests if you wish.
The details of your script are up to you. However, do look into
the wget utility, which should be useful. Also, if you
use wget, be sure to set the http_proxy
environment variable to point to your Squid proxy.
The access.log file includes both what requests have
been made to Squid and how those requests were handled. Each line
includes one request. A typical line looks like this:
1079897254.896 569 127.0.0.1 TCP_MISS/200 33254 GET http://www.gatech.edu/ - DIRECT/130.207.165.120 text/html
This is a GET-style request for the page
http://www.gatech.edu/. It was handled as a TCP_MISS,
which means that the file was not in the cache and had be downloaded.
There is software around to interpret Squid log files. Feel free to use it.
Please do not submit an entire tarball of your modified Squid. Instead, keep a copy of the original Squid archive separate from your hacked version, and generate a diff file with a command like this:
diff -ru squid-2.5.STABLE5 squid-hacked
Browse through your diff file before turning it in, to make sure
that it includes all of the changes you have made.