Project 3: RPC-based proxy server

Updates

See below

Introduction

Remote Procedure Calls (RPC) are powerful and commonly-used abstractions for constructing distributed applications. Sun (ONC) RPC is one of the oldest general-purpose RPC implementations and popular in Unix environments, particularly because NFS is implemented using it. In this project, you will do the following:
  1. Write a simple 'proxy server' using RPC.
  2. Investigate and implement different caching mechanism for your service
  3. Evaluate the performance of your service under different load conditions and using different caching mechanisms

This project has two basic goals. The first is introducing you to programming with a real remote procedure call system. Sun (ONC) RPC is may seem slightly dated, but it is widely deployed and has been used in many real distributed applications. The second goal is exploring the principles and performance of caching schemes in a distributed application.

You should work in groups of two for this project.

Specific Details

You should do the following:

  1. Implement an RPC based 'proxy server' and a client to use it. It will be almost like a simple proxy server except it doesn't have to parse HTTP requests and you won't be implementing any sockets-based communication. You should implement at least two distinct items: a RPC proxy server where a client can request a URL via RPC, and a simple client application that can perform these requests for performance analysis. These two pieces will run on different machines and communicate using Sun RPC. The RPC service should have at least one RPC call, signifying an HTTP GET request. The proxy will execute the HTTP GET on behalf of the requesting client and return the data to it. You don't have to implement your own mechanism for talking to a web server to perform this GET request: you can simply use libcurl (see below for examples). Your client should probably read a list of URLs indexed by line number and then randomly request entries. It should also have some mechanism for timing and calculating bytes transfered, connections per second, etc. If you want to make your applications more complicated or advanced, go nuts, but they should at least allow the previously described basic interactions.
  2. Add a caching mechanism to your proxy server. It should be a limited in-memory cache and will have several replacement policies (more on the cache replacement policies later). This cache can be fairly simple and doesn't need to follow official HTTP cache control protocols or worry about the possibility of stale content or content expiration. It will simply hold the result of a specific request until another request for the same URL is made (at which point the request should be satisfied from the cached copy). You also don't have to worry about canonicalizing URLs for uniqueness.
  3. You will design and perform some experimental evaluations of your RPC-based proxy. You should come up with a reasonable set of traffic requests and configure your client to stress-test your proxy. Your should run multiple copies of your client on several different machines simultaneously to simulate a large workload (you may also want to make a multi-threaded client). Be careful not to perform a denial-of-service attack on remote web servers when testing your proxy. You will test your proxy with no caching and then with each of your cache replacement policies. Your cache should be relatively small compared to the total set of possible requested documents because you want a lot of cache contention. You should measure the hit rates of your caches as well as the effect on performance (in KB/sec, average time to fulfill a request, requests per second, etc.). Try varying the cache size. Analyze and justify your results in the write-up.

The meat of this project will focus on cache replacement policies and experiments. Sun RPC and libcurl will allow you to implement a proxy server without the extra overhead of socket programming. You will also get some very basic experience using RPC. Again, the details of Sun RPC and libcurl should be a fairly small and trivial, so if you're spending a significant portion of your time on them, you're probably overlooking something. They aren't the hard part of your project, so don't get stuck!

Replacement Policy

A cache of any kind must have a replacement policy. This is the policy that is used whenever a new item is added to a full cache (in order to decide which old item to evict). That is, the new item replaces some other object in the cache. Some common strategies are:

For this project, you are to implement at least three policies:

  1. Least Recently Used (this can be an approximation based on some fixed history or "absolute" based on time-stamps)
  2. Random
  3. A different policy of your choice

ONC RPC with rpcgen

"Open Network Computing" RPC (or Sun RPC) is a widely-used protocol for RPC in different programming languages. It relies on an available service (RPC portmap) to handle binding and service location. ONC RPC also relies on the XDR standard (eXternal Data Representation) to define a common wire-format for structured data sent between machines. It also defines an interface definition language that can be used with rpcgen to automatically generate stub code for services (as well as marshalling and unmarshalling code). You should use rpcgen in your project. Check out the resources section for links to some tutorials on using rpcgen.

Remember not to leak memory if you are sending variable-sized data across the wire (and you will need to). You are responsible for calling xdr_free in your svc code and also implementing freeresult. These will probably be as simple as calling xdr_free with the appropriate xdr procedure for the datatype you are attempting to free (the second parameter). For instance, if you have a service that takes in foo_in *in, you'd probably call xdr_free(xdr_foo_in, in); before you return.

I have included a sample Makefile to show you how you will probably call rpcgen and how the rpcgen generated pieces might fit together. It assumes you named your interface definition proxy_rpc.x. The Makefile is just for illustrative purposes; you can feel free to tweak it or ignore it completely.

The "-M" flag will generate multi-threaded safe RPC code, but the service will not be multi-threaded automatically (some platforms other than Linux have rpcgen with the -A flag, which will generate multi-threaded services). You may want to try making the RPC service use thread pools (by modifying the generated stub code). Remember to modify your Makefile if you edit the generated stubs so you don't blow away one of your custom files with an auto-generated one.

Note that you cannot make RPC calls from hosts outside of the CoC network to inside the CoC network (this includes from LAWN or outlands on cc.gt.atl.ga.us). You can always make RPC calls between machines inside the CoC network, however.

If you have trouble compiling your rpcgen-generated code with gcc-4.0, add the following to your interface definition file:

%#undef IXDR_GET_LONG
%#define IXDR_GET_LONG(buf) ((long)IXDR_GET_U_INT32(buf))
%#undef IXDR_PUT_LONG
%#define IXDR_PUT_LONG(buf, v) ((long)IXDR_PUT_INT32(buf, (long)(v)))

libcurl

libcurl is a powerful library for communicating with servers via HTTP (and FTP, LDAP, HTTPS, etc.). It supports HTTP GET/PUT/POST, form fields, cookies, etc. The purpose of using it in this assignment is to simplify your life. Instead of writing lower-level sockets code to talk HTTP to a webserver and request a page, you can simply use libcurl (which can do the same in a few function calls). In fact, you can just use the example code demonstrating how to perform a simple HTTP GET request. But if you want to implement your own socket-based code to talk to the remote webserver, go ahead (just remember that it is not the focus of this project).

example.c is sample code that uses libcurl to perform an HTTP GET of a specified URL (the Makefile shows you how to compile it, but basically all you need to remember is to add -lcurl to link with the curl libraries). The resultant data (just the data, not the headers) is captured into a dynamically allocated buffer and then written to stdout. You can use this code with few modifications directly in the proxy to perform the proxy GET requests.

If you have trouble with servers returning errors complaining about the lack of an HTTP User-Agent field, add a curl_easy_setopt(handle, CURLOPT_USERAGENT, "<agent name>"); before the curl action is performed.

Resources

The following documents have useful information on relevant network protocols. You will need to have a cursory understanding of the protocols in order to do this project, but do not bother becoming extremely knowledgable about them just for this project. This is an OS class.

Note that the above rpcgen tutorials are for various platforms (FreeBSD, Digital Unix, etc.) and different platforms have slight differences in rpcgen and the various libraries. If you stick to the standard it should work transparently but beware of special capabilities. For instance, Solaris's rpcgen has a switch (-A) to generate a multi-threaded server which other platforms' rpcgen implementations do not typically have. When in doubt, check out the relevant man and info pages.

Suggestions

Start by developing a toy service using rpcgen. Make sure that the client and proxy are talking (send a URL string to the proxy and print it out to make sure everything is received okay). Next try adding the page request logic. Make sure to sanity check your results. The client should receive the entire document that the proxy requested on its behalf (print the page to the screen, check sizes and make sure nothing is corrupted). Finally, when your basic proxy works, add result caching. Worry about performance analysis and benchmarking after you are sure that everything is in working order so you don't waste time collecting incorrect results. When you run the final tests, don't have a lot of spurious console output from either the proxy or client because a lot of console IO will significantly degrade your performance.

Deliverables

Updates


CS 6210