CS 6210 Advanced Operating Systems
Spring 2003
Project III: Web Proxy Caches

Due: 11:59 p.m., March 26, 2003
(One minute prior to midnight on Wednesday, March. 26th.)
This project is to be completed in groups of  3-4.


Goal
Design and develop a web proxy server cache using the server and client components already developed in Project 1. You will build functionality in this project and develop experiments and evaluate your system in subsequent projects. The client is a web browser or another custom program that allows testing of your system. You will provide individual contributions of each team member in the project directory. Grading will be done a team basis. One person teams are not encouraged. Choose an appropriate web server concurrency for your evaluation.


General Information

Resources

Details
 

A web server proxy is used by networks at the Internet's edge for a  multitude of reasons including security and performance. In this project, you will concern yourself with performance using web proxy server page caches. Consider a network architecture with one parent PRIMARY web server cache and two child SECONDARY web server caches. Each web server cache is maintained on a separate machine. The system with the PRIMARY web server cache has access to the Internet. The SECONDARY web server caches simply use the PRIMARY web server cache. Clients cannot connect to the PRIMARY web server cache, they connect only to the SECONDARY web server cache. The size of the secondary web server cache is half the size of the primary web server cache. Your system should be able to dynamically size the cache on system bootup. A request from a client to a web page not in the SECONDARY or PRIMARY web server caches, will bring the web page from the Internet into the PRIMARY  and SECONDARY caches (where the client is connected to only - the parent SECONDARY cache).  If the source of a webpage entry in the Internet is updated, the entry is invalidated in the PRIMARY cache and also in children
SECONDARY caches. This is so that the PRIMARY cache can re-load and refresh the page using INVALIDATE or UPDATE
strategies. The PRIMARY cache must always include all read/write pages in both the SECONDARY caches. This is so that an invalidate request from an Internet source will invalidate the PRIMARY cache entry and the SECONDARY cache entries. Read-only, non-updatable pages in both the secondary caches can be replaced in the PRIMARY cache.

Build -
 
CACHE MANAGEMENT
a) An efficient web server page cache - primary and secondary. Allow FIFO, random and LRU page replacement policies. You will evaluate these policies in the next project. Generate a trace to convince us that this works. You will maintain bits for pages that are read-only or read/write.
 

CACHE CONSISTENCY
b) In the event a page in the PRIMARY web server CACHE changes, this could be a sports scores page or a CNN news page or any other page that changes,  the PRIMARY web server cache may INVALIDATE the page in the SECONDARY web
server cache or UPDATE (ie. provide a copy and replace) the page in the SECONDARY web server cache. Simulate a page change in the PRIMARY web server cache. You do not have to program your system to respond to external Internet-based web server page changes.  If you can do this though, please state this in your report for bonus points. You will build a lightweight protocol to allow this. This protocol can be modeled after ICP (Internet Cache Protocol, check relevant RFC) or can be your custom design. You may piggyback this protocol on HTTP or this could be completely in the sideband. Generate a trace to convince us that this  works. Make sure that you allow the clients to be connected in different regular topologies. In subsequent projects, you will connect the secondary caches in a mesh and evaluate the effects of cooperative caching.

COOPERATIVE CACHING
c) Imagine a situation where a client makes a request to a page that  is not in it's parent SECONDARY web server page cache, where it is connected to. The SECONDARY web server cache can then request this from the PRIMARY web server cache or another peer SECONDARY web server page cache. Generate a trace for a situation where such a request is fulfilled by a peer cache (the PRIMARY web server cache does not have the page anymore, it probably was replaced). Show by measurement that this is more efficient than requesting the page from the Internet via the PRIMARY web server page cache? why not?
 

This project will be graded on the following basis -

a) Efficient data structures for PRIMARY and SECONDARY caches. Are they the same? can they be different? Give reasons.

b) Web page reference traces that convince us that the system actually works.

c) Basic measurements that convince us that the caching actually works. Measure a cold miss with the caching system and without the caching system. What is the overhead of hierarchical caching?. Other measurements of system primitives.

d) Schedule a DEMO. Details forthcoming.
 

Demonstration platform

EDHPC machines - 1 through 4. Use 1 as the PRIMARY web server cache, 2 and 3 as the SECONDARY web server caches and threads on EDHPC4 as  clients.
 
 


Due Date & Turn-In Process
When: March 26, before midnight. This is one minute prior to midnight on Wednesday, March 26th. No late assignments will be accepted unless prior arrangements have been made.

Where:  /net/hc280/class/cs6210/groups/<group_name>. Please create a README file in each group directory with the names of group members. Email to help@cc if you cannot create your group directory.

What: