CS 6210 Advanced
Operating Systems
Spring 2003
Project I: A
Multithreaded Tiny Web Server with Performance Evaluation
Due: 11:59 p.m.,
Jan. 27, 2003
(One minute prior
to midnight on Monday, Jan. 27th.)
This project is to
be completed individually.
Objective
In this assignment you will learn multi-threaded programming, performance differences of services implemented in user and kernel-space and experimental skills required to compare different systems software implementations. You will construct a web server that implements a minimal subset of the http protocol and compare it with a kernel-space implementation of a freely-available web server like khttpd or "tux", based on throughput, latency or other suitable benchmarks. Most of the grade of this project will be for performance analysis. This is a warm-up project, if you haven't coded in a while. You will use 4 of the EDHPC Linux machines, here are their names. Machines (1, 2 and 3) are considered Triad I, machine (4) is considered Triad II.Rest of the machines are reserved for use by the 6235 real-time class until Feb 10, 2003. You may use machine (4) for compiles and testing. Subscribe to edhpc-lab@cc.gatech.edu (if not already done by CNS) by sending email to rk@cc, with "subscribe edhpc-lab" in the body and blank subject. You will make your reservations for Triad I on this email list. Make sure you do this one hour ahead and also indicate the duration of the reservation (not to exceed 30 minutes). The idea is so that you can get clean numbers by exclusive access to Triad I. Machine 1 will run the kernel-space web-server (TUX), loaded by Machines 2 and 3 (for performance numbers only). Machine 4 will run a kernel-space web server (TUX) for testing purposes only. Please direct any reservation, IHPCL/EDHPC, TUX support questions to Neil Bright, CNS (ncb@cc.gatech.edu) and CC: rk@cc.
General Information
- Read this assignment carefully, in it's entirety, before you start coding - it may save you a lot of time later!
- You may use any machine where you have access to pthreads . Pthreads are installed on the OIT machines, so you do not need to use CoC resources for development (until you start getting performance numbers). Your final result MUST run on the CoC EDHPC machines. If you don't have a CoC account, apply for one by filling out an account request form (available outside Peter Wan's office, CCB 213). Make sure you ask for edhpc access on the form.
- Use the reference pointers below to find concrete technical information (e.g., Pthreads tutorials and examples, specification of the HTTP protocol, thread debugging, etc.). If you are unfamiliar with network programming, read the socket programming examples and the man pages for the socket, bind, listen, accept, and connect calls. The socket code that you will need for clients and servers is very basic.
- If you have any specific questions, ASK! Use the newsgroup for broad questions. (And if you see a question in the newsgroup and you know the answer -- post it!)
Resources
- HTTP Web Servers
- micro_httpd : a tiny HTTP server
- phttpd: a multithreaded web server
- Socket Programming Resources
- Pthreads Resources
- pThreads class handouts & references
- Solaris 7 "Multithreaded Programming Guide"
- news://news/comp.programming.threads
- LinuxThreads
- GNU Pth Portable Threads package
- Miscellaneous
- Linux man pages: make, nanosleep, sched_setscheduler, sched_setschedparam, pthread_setschedparam
- Utility to watch program and report on thread priorities: watchprior.c
- Utility to check if real-time scheduling changes are permitted or not: ptest.c
Construct a simple user-space multi-threaded web server.You have tremendous amount of freedom to complete this part of the assignment. You can take the micro_httpd source and multithread it or write your own multithreaded server (make sure you attribute which functions were used from the original micro_httpd source). You are free to look at the phttpd multithreaded server source code (this runs on Sun Solaris only). Extra credit will be given for "built from scratch" constructions. Please be sure to attribute source code or functions used from elsewhere.
(You are only required to implement a tiny subset of the HTTP protocol, so technically your server should not be called an HTTP server.) Your server should support "simple" HTTP:
The port on which your server listens for connections should be a run-time parameter. You may implement more than the required functionality for extra credit, but that is not the focus of the project. You can find the specification of the HTTP protocol on the Web. An easy way to implement the tedious parts of HTTP is to get code from an existing implementation. For instance, micro_httpd implements much of HTTP in about 150 lines of C code!
- requests (GET <path> carriage-return line-feed), and
- responses (requested document with connection termination afterwards), and
- minimal failure functionality (return a "404 Not Found" rather than just dropping the connection).
Your server should consist of a single "master" thread recieving requests and dispatching them to "slave" threads. The "slave" threads continue the communication until the file is completely sent and the connection is terminated. To avoid the overhead of creating new threads for each incoming request, you should maintain a collection of "slave" threads that consume the "masters" work requests. The number of slaves in the collection should be a run-time parameter. Your server should look for files in a specified directory, (ie. /var/tmp/yourname), and not allow access to files in other directories. You must exercise caution with respect to system calls - make sure you use the thread-safe versions of system calls.
Warning: Some network calls are not thread safe, but many have thread safe variants or alternatives. The gethostbyname() is an example. You are responsible for ensuring the thread safety of the calls you use.
When you install your server on machine example.cc.gatech.edu , port 8008, your browsers URL request will be:
http://example.cc.gatech.edu:8008/<filename>. Your server should then look for file /var/tmp/yourname/<filename>. Since you don't have administrator privileges on the CoC machines, you need to implement some security method so that random browsers cannot detect your server and use it to access your personal files, through a request like: http://example.cc.gatech.edu:8008/../../username/personal-file.Construct a multi-threaded Load Generator
Write a multi-threaded client that can generate requests from a variable number of threads, that can be specified as a run-time
parameter. You should be able to run the client on any machine and "load" the web server. Make sure you have code that can record the request rate over a period of time (1 or 2 minutes) and be able to match the requests and responses. Also the client should be able to measure and record the round-trip response time for each request. Can you come up with a way to increase request generation throughput by improving concurrency?. One strategy might be to decouple request and response pairs. Come up with your own strategies.Test your load generator with the kernel-space web servers on edhpc
Request foo1.html, foo2.html , foo3.html and foo4.html from any of the edhpc machines. You can access them as http://edhpcN.cc.gatech.edu:80/fooX.html, where N is the machine number {1, 4} and X the file number. These web pages will be served using the TUX kernel-space web server.
Experiments
You must complete the experiments for the (1) user-space web server that you have constructed and also for the (2) kernel-space web server available on each of the edhpc machines. Run the multithreaded load-generator on two machines and use four threads in each load generator. Thread 1 will request foo1.html, Thread 2 will request foo2.html, Thread 3 will request foo3.html and Thread 4 will request foo4.html.
(1) Measure web server throughput over a period of 60 seconds. You will measure the maximum number of requests generated over a period of 60 seconds. Match the responses for each of the requests, however long it may take. Record the total time for generating requests and collecting all responses. You will compute the average rate of aggregate request generation and response collection. Measure these independently for each machine and record the average. Record these values for each of 1-4 threads. Plot web server throughput versus number of threads for each machine. Can you use your concurrency strategy to improve request generation throughput and show an improvement? Plot the new results.
(2) Measure the response time for each request generated in suitable time-units in Experiment I for 1-4 threads. Plot Response time versus Throughput and compute average response time over all requests.
Discussion of Results
Do you see a difference in performance - throughput or latency between the kernel version and the user-space incarnation of the web servers. Can you explain the graphs for each individual web server and also the reason for performance disparity,
if any?. Study the internal architecture and look through the source-code of the kernel web-server (if available), how does the kernel-space version reduce latency or improve throughput?. Setup a web-page or postscript file to document your results. I do not want detailed text - we are looking for short insightful analysis. I would like to see the architectural diagram for the user-space web server and the load generator.
Due Date & Turn-In Process
When: Jan 27, 2002 before midnight. This is one minute prior to midnight on Monday, Jan 27th. No late assignments will be accepted unless prior arrangements have been made.Where: create a directory called "project_1" in your personal class project directory. Place all binaries and sources there with instructions to run and compile the code. You should see a link from your home directory, the location of the class space is [/net/hc280/class/cs6210/~coc_account].
What:
We are going to compile and run your programs from scratch - remember, they must run on the EDHPC cluster. If you have any special commands needed to compile your code then mention them in your README file (or put them in your Makefile).
- The source code - must include a Makefile (see example) .
- A README file explaining the usage of the program, assumptions you made, etc. Everything that helps us in compiling and testing your application.
- A web-page or pdf file with a short report and results. Provide a URL for your web-page.
- be prepared to demo your program, if needed.