CS 6210 Advanced Operating Systems
Spring 2006

Project I: A Tiny Web Proxy Server

 

Due: 11:59 p.m., Jan. 31st, 2006

(One minute prior to midnight on Tuesday, Jan. 31st.)

This project is to be completed individually.

 

Goal

The goal of this assignment is to introduce multi-threaded programming to explore the resulting interactions. You will be required to implement a number of communicating processes on a Linux or Unix platform. You will design and implement a multi-threaded web proxy server. You will setup up experiments to demonstrate the correctness of your program.

General Information

The Project Requirement

The project consists of three steps.

  1. Implement a simple web server.
  2. Write a client program to demonstrate the functionality of your server.
  3. Improve your web server to do some proxy cache job.

Step One: Implement a multi-threaded web server.

You are only required to implement a minimal subset of HTTP protocol. For simplicity, your server need only process requests in the following format at this step: GET Path\n\n

 

You server should consist of a master thread accepting incoming requests, and multiple worker threads serving the requests. The port to which the master thread listens should be a run-time parameter. The master thread accepts incoming connections, creates incoming request records and put them into a list. The worker-threads will pick up incoming requests from the list, read in request data from the incoming socket links, and serve the requests. For simplicity, the server need not translate the Path into a real file and serve the request from disk. You only need to generate a document for it. The generated documents should include the following information:

 

Your server needs not to handle any exception; one HTTP OK 200 response is enough at this step. Here is an example of the incoming request and response. The response you generated need not be in the exact same format as the example. It is enough to just include all the information there. And you are free to pad any information to the response, if you feel that a large file is useful to prove the correctness of your program in your experiments:

 

Incoming:

                        GET foo.html

            Response

                        HTTP/1.0  OK  200

                        Content-Length: The_actual_bytes_of_the_generated_content

                       

                        Worker Thread ID:

 001

                        Request Time:

                                    Mon, 16 Jan 2006 12:00:00 GMT

                        Incoming Request:

                                    GET foo.html

                        Request For:

                                    foo.html

 

The worker thread should generate an in-memory log entry for each request it serves. There should be one logging-thread, periodically write the in-memory logs to a log file.

 

You have full freedom on the internal data structure of your server. But in summary, your server should fulfill the following requirements:

1.            A master thread listens to a specified port.

2.            A fix number of work threads created at the beginning.

3.            A logging thread.

4.            Handles every incoming request and generates response.

5.            The port number and the number of work threads should be runtime parameters

 

Step Two: Implement a multi-threaded client.

You will define a workload file; and implement a multi-thread client program to read in the workload. The worker thread of the client program will send HTTO GET request to your server, and read in the response, write the response to a file. Also, a log file needs to be created to record the activities of your client programs (most important thing is to record that which thread processes which request from the workload file).

 

And you need to run your experiment, collect the log files at both the server and the client, together with the workload file, the files your client program saves. Write a simple note to prove that your server and client program work as expected using the above files. Note that the client and the server will run on different machines. So the server name and the server port and the name of the workload file should all be runtime parameters

 

Step Three: Improve your server to work as a proxy cache.

A proxy cache accepts the client's request, forwards it to the real server. Then accepts the response from the server, relays it back to the client. At the same time, it may cache the response locally. So in the future, it can serve same requests from its own cache without contacting the backend server. In this step, you will improve your server to work as a proxy cache.

 

First, you need to implement two options in the HTTP GET message. The new request may look like: GET path\nHost : servername : serverport\nIf-Modified-Since: time\n\n. There will be two copies of your server programs running on two machines, one as the proxy and one as the backend server. Your client program will send the request to the proxy with the HOST setting as the backend server. When the worker thread of the proxy receives the request message, it will parse the HOST setting, and sets up a connection to the backend server. Next, it forwards the request to the server and read in the response and relays the response back to the client. Your server should know that it is a proxy or a real server by comparing the HOST setting with its own IP and port. The proxy should return 404 Not Found when the Host setting is not correct.

 

When the proxy gets the response from the server, it will create an in-memory cache entry indexed by the incoming request. So for same requests in the future, it can serve it from its cache table. The in-memory cache table need not be large. You can maintain just a fixed number of entries using LRU (least recently used) policy. If the incoming request has the If-Modified-Since: time setting, we assume that the content need to be refreshed (without looking at the real time parameter for simplifying your job). Thus, the corresponding cache entry, if there is one in the table, need to be removed. And a connection to the server needs to be setup to retrieve the content again.

 

You need to design workload to prove the proxy can relay the request and response correctly; and the in-memory cache table works properly.

 

Resources

 

micro_httpd : a tiny HTTP server

HTTP specification

http://www.manualy.sk/sock-faq/unix-socket-faq.html

BSD Sockets: A Quick And Dirty Primer

Solaris 7 "Multithreaded Programming Guide"

LinuxThreads

make; pthread_create; etc.

 

--------------------------------------------------------------------------------

 

Due Date & Turn-In Process

 

When: Jan 31st, 2006, before midnight. This is one minute prior to midnight on Tuesday, Jan 31st. No late assignments will be accepted unless prior arrangements have been made.

 

Where: Via email to: jiantao@cc.gatech.edu, Subject: CS 6210 Project One

 

What: Submit the following in a UNIX "tar" archive attached to your email. The name of your tar file should be cs6210proj1_yourname.tar or cs6210proj1_yourname.tar.gz

 

 

Make sure that you do NOT include any binaries in your tar files. We are going to compile and run your programs from scratch - remember, they must run on a CoC Linux or Solaris machine. If you have any special commands needed to compile your code then mention them in your README file (or put them in your Makefile).