CS6250: COMPUTER NETWORKS
Project description
Design and implementation of a reliable transport protocol
over an unreliable and dynamic datagram network
Revision 2 (Sept 24, 2002)
Introduction
In this project you will design and
implement your own network protocol hierarchy!
This is a challenging and exciting task, that
is necessary for understanding computer networks in depth.
In this handout, we give you a rather loose specification
for the functionality of the protocols that you will
design.
You have significant flexibility in adjusting
this specification to your own `protocol-design taste'.
Even though there are large similarities between the
following specification and the TCP/IP protocols, keep
in mind that there are also significant differences.
Emulated network topology
The topology of the emulated network,
i.e., the network nodes and the interconnecting links,
will be specified by an Initial Topology Configuration (ITC) script.
An example of an ITC script for a network with 4 nodes and 6
unidirectional links follows:
1 akron.cc.gatech.edu 17878 2 3
2 akron.cc.gatech.edu 17879 4
3 boise.cc.gatech.edu 17879 2 4
4 dollas.cc.gatech.edu 17890 1
Each line of the ITC corresponds to a network node X.
The first argument is the unique Node ID (NID) of node X,
the second is the name of the host that will be emulating X,
the third argument is the port number used for all communications to/from X,
while the following arguments are the NIDs of the nodes that X is connected to.
Note that the links are unidirectional. In the previous example for instance,
node-1 can send messages to node-2,
but it cannot receive messages directly from node-2.
The ITC script will follow the format and structure
of the previous example. It will not include any comments, but it may include
arbitrary blank spaces between successive arguments, or blank lines.
Suppose that your executable program is called net6250,
and the previous ITC script is called itc_1.
To emulate the network of the previous example you would need to run
four instances of net6250, as shown next. Note that in each command
line you have to specify the NID of the corresponding node,
as well as the filename of the ITC script.
At akron.cc.gatech.edu.edu run: net6250 1 itc_1
At akron.cc.gatech.edu.edu run: net6250 2 itc_1
At boise.cc.gatech.edu.edu run: net6250 3 itc_1
At dollas.cc.gatech.edu.edu run: net6250 4 itc_1
Note that the ITC script defines the initial network topology.
After initialization, your routing protocol should be able to detect topology
changes and react to them determining new shortest-path routes.
Two types of topology changes can occur:
- A node failure, meaning that the process that emulates
a network node, together with all its links, is aborted.
- A link failure, meaning that a particular link
is disabled.
Both failures can be permanent or temporary (i.e., an aborted node or link
can be restarted).
There will be no introduction of nodes or links
that were not present in the original topology.
Emulation of point-to-point links
The point-to-point links specified at the ITC script will
be emulated passing UDP/IP messages between two directly connected
network nodes.
For instance, if node-1 is connected to node-2,
any message that node-1 sends to node-2 will have
to be encapsulated in a UDP/IP packet and send to
node-2 through a UDP socket.
All point-to-point links will have a Maximum Transmission
Unit (MTU) size of 1100 bytes (excluding the UDP/IP headers).
Given that the local area network of CoC is quite reliable,
we would not expect to see any network errors (which
makes the project less interesting!).
To make things more challenging,
we will introduce a ``garbler'' module between your
network layer protocol and UDP/IP. The garbler will be
introducing, according to a certain probability, the
following types of errors whenever you send a packet
from one node to another:
- Packet loss (the entire packet is lost)
- Packet corruption (one or more bit errors)
- Packet duplication
The garbler implementation will be provided by us, and it will
have to be linked to your code. To use the garbler, you
should replace each instance of the socket system call
sendto with our ``garbled'' version sendto_garbled.
The two functions have exactly the same arguments.
NOTE: Besides the garbled UDP sockets,
you should not use any other communication primitives in this project.
The network and the network-control protocols
The network protocol will be using the emulated point-to-point
links to transfer packets from one network node to another,
even when the two nodes are not directly connected.
This network layer functionality will be offered by a
Datagram Network Protocol (DNP) that you will design.
DNP should have the following salient features:
- DNP is based on datagrams, rather than virtual-circuits.
In other words, transferring a packet from one node to another
does not require setting up forwarding state in the intermediate
nodes.
- DNP is unreliable, meaning that it does not check for
packet errors/losses/duplications.
- DNP should include a Time-To-Live (TTL) header field, to
protect the network from packets that circulate the network forever. The
TTL field determines the maximum number of network nodes a
packet can go through before it reaches its destination.
The TTL of a packet is decremented every time the packet is
sent from one network node to another, and the packet is
discarded when the TTL becomes zero.
- DNP should be able to carry variable-length packets,
as long as they are smaller than the MTU size of the underlying
point-to-point links.
- Even though there will be only one transport protocol using DNP
in this project, DNP should be able to demultiplex incoming
packets to four different transport protocols.
DNP should also offer the following two network error
reporting services. These two services can be thought of
as an additional protocol,
called Datagram Network Control Protocol (DNCP).
- When a packet cannot be routed to its destination
because the destination node is currently unreachable,
DNCP should reply to the original sender of the packet
with a ``node-unreachable'' message. The goal of this
message is to notify the sender that
the packet transfer was not currently possible because the
destination node is unreachable.
- When a packet's TTL reaches zero, the network node
that discards the packet should reply
to the original sender of the packet
with a second type of DNCP message. The goal of this
``TTL-expired'' message is to notify the sender that
the packet transfer was not possible because the
TTL expired. The TTL-expired message should include
the NID of the node that discarded the packet.
The routing protocol
Each network node will be maintaining a routing table.
The routing table gives the next-hop for each destination node.
For instance, node X may learn from its routing table
that a packet destined to node Y should be first forwarded to
node Z (X must be directly connected to Z for this to happen).
Note that DNP at node X does not know the entire
route that the packet will follow to its destination Y;
it simply knows what the next hop should be node Z.
The routing table will be constructed by a Link-state Routing
Protocol (LRP) that you will design.
LRP should have the following salient features:
- LRP has to identify the shortest-path route
to each destination, especially when the network topology is
stationary.
- LRP must be a link-state routing protocol (rather
than a distance vector protocol).
- LRP should be able to detect dynamic changes in
the network topology, and react to them by recomputing
shortest-path routes. The types of topology changes were
described earlier.
- LRP should use DNP, in the sense that all LRP
packets are encapsulated in DNP packets.
Keep in mind that the LRP packets are also subject to
being ``garbled'' when sent to the network. Specifically,
LRP packets have to be encapsulated in DNP packets,
and DNP is not reliable. So, it is
important to provide LRP with reliability to losses,
errors, and duplication.
You can assume that the number of
network nodes will be up to 32, the number of network
links up to 128, and the number of links per node up to 8.
The transport protocol
DNP offers an end-to-end packet delivery service,
but it is unreliable, it does not provide flow-control,
and it does not provide byte-stream communication semantics.
This is the task of the Reliable Transport Protocol (RTP)
that you will design.
RTP should have the following salient features:
- RTP must provide reliable byte-stream
communication semantics, similar to what TCP offers.
To meet this goal, RTP has to protect the transfer
from packet errors, losses, replications, and out-of-order
delivery. Note that out-of-order packet delivery is possible,
due to dynamic route changes while a transfer is in progress.
- RTP will be able to do only one-way data transfers.
A two-way data transfer would require two RTP connections.
- RTP has to provide window-based flow control, to
prevent the sender from overrunning the receiver.
- RTP must use a sliding-window algorithm for
reliability and flow control.
- RTP must use a checksum algorithm for error detection.
- The receiver should be able to select the
advertized window when an RTP connection is established.
The maximum window size should be 32,000 bytes.
The sender's window size can be equal to the receiver's
initial window size.
- RTP must use selective acknowledgments.
- RTP has to determine the timeout
of each RTP connection based on the connection's Round-Trip Time.
To do so, RTP has to be able to measure the
connection's Round-Trip Time, based on the timing
between data packets and the corresponding acknowledgments.
- RTP should be able to demultiplex received packets to
up to 256 different ``service points'' (think of them
as ports) at the receiving node.
Note: an important design decision
is whether RTP will use sequence numbers based on bytes or packets.
Packet-based sequence numbers are simpler,
but they require fixed-length packets.
The application
At the highest layer, your project
will implement a simple file transfer application.
This application, called Non-blocking File Downloads (NFD)
will be running at each node of the emulated network topology.
NFD will provide the user with a command-line interface.
This interface should be non-blocking, in the sense that each command
should return before the corresponding operation (e.g., a file
transfer) is actually completed.
Note that the non-blocking nature of the application is of
major importance for your implementation. It means, besides
other things, that your implementation will have to be
multi-threaded, with one thread for each ongoing transfer.
The NFD interface will support establishment of connections
to remote nodes, transfers of files, control of the window size
for each connection,
and a few other ``network monitoring and management'' operations.
Specifically, the following list gives the operations
that the command-line interface should support. Notice
that all commands and their arguments should be case-insensitive,
and that arbitrary blank spaces/lines are allowed.
- start_service(P): this command establishes a
service point at the local node, say X. The command returns
a Service ID (SID) S, that is unique for each service point
of node X.
Other nodes can connect to S, and download files from that
service point. The argument P is the maximum number of
open connections that S can accept; requests for more
than P connections are rejected.
Examples of resulting messages:
"Success: SID=123, MaxCons=3"
"Failure: MaxCons=0 bad argument"
- stop_service(S): this command terminates the
service point with SID S at the local node. Any open
connections at that SID are aborted.
Examples of resulting messages:
"Success: SID=123 is terminated"
"Failure: SID=123 does not exist"
- connect(Y, S, W): this command establishes a
connection from the local node, say X, to the service
point with SID S of the remote node Y. The window for
the connection should be W (a multiple of 1000 bytes).
The command returns a Connection ID (CID) C,
that is unique for each open connection at service point S.
Examples of resulting messages:
"Success: DestNode=2, SID=123, CID=3, Win=15000B"
"Failure: DestNode=2 does not exist or unreachable"
"Failure: SID=123 does not exist"
"Failure: SID=123 rejected connection request"
"Failure: Win=260 bad argument"
- close(C): this command closes the
connection with CID C.
Examples of resulting messages:
"Success: CID=3 is closed"
"Failure: CID=3 is not open"
"Failure: CID=3 cannot be closed"
- download(C, F): this command downloads
a file with name F through the connection with CID C.
The transfer is possible only if that connection is
not currently busy with another download operation.
The file F should be present at the current working
directory of the process running at the remote end
of the connection.
After the transfer is completed, a message should
report the duration, average throughput, and
Round-Trip Time (RTT) of the transfer.
The same connection can be used for multiple download operations.
Examples of resulting messages:
"Success: File=foo has been downloaded (duration=3sec, throughput=160KBps, RTT=100msec)"
"Failure: File=foo does not exist"
"Failure: CID=3 does not exist"
"Failure: CID=3 is busy with other transfer"
"Failure: CID=3 is broken"
- set_garbler(L, C, D): this command controls the
garbler characteristics at the local network node.
- L: packet loss fraction (between 0 and 100)
- C: packet corruption fraction (between 0 and 100)
- D: packet duplication fraction (between 0 and 100)
All arguments of this command are in integer format.
Examples of resulting messages:
"Success: Loss=1%, Corruption=1%, Duplication=2%"
"Failure: Loss=120% bad argument"
NOTE: We will provide you with the code for this command.
- route_table: this command prints the local
routing table, i.e., for each network node, show the corresponding
next-hop and the cost of the route in terms of number of hops.
- traceroute(N): this command prints the route,
as a sequence of nodes, from the local node to node N.
Note that the implementation of this command requires
using the TTL DNP header field,
and the functionality of DNCP.
Examples of resulting messages:
"Success: route to node-1: 2,4,6,1
"Failure: node-1 does not exist"
"Failure: node-1 is unreachable"
- link_down(N): this command disables the link
from the local node to node N.
Examples of resulting messages:
"Success: link to node-1 is down"
"Failure: link to node-1 does not exist"
"Failure: link to node-1 is already down"
- link_up(N): this command enables the link
from the local node to node N.
Examples of resulting messages:
"Success: link to node-1 is up"
"Failure: link to node-1 does not exist"
"Failure: link to node-1 is already up"
- debug: this command is optional.
You will find it useful to have such a command, however,
for debugging and monitoring your project.
It can report all important information about the
local node, such as active service points, opened
connections, varius statistics, etc.
- exit: exit the program gracefully.
Design report
The first part of this project will be to design
your overall system, and write a design report.
This report will guide your implementation later,
and coordinate the group's efforts.
The importance of starting with a thorough design
cannot be overemphasized!
A major part of the design is the specification of
all the required protocols: DNP, DNCP, LRP, RTP, and NFD.
The specification of a protocol usually involves:
- Protocol functionality and algorithms.
Protocols are usually specified formally
with state diagrams. For an example of a state diagram,
see Figure 5.7 of your textbook.
- Protocol header fields and message types.
This has to be done down to the bit level.
For an example, see Figures 4.3 or 5.4 of your textbook.
- Processing steps involved in sending and receiving
each protocol message.
Together with the specification of each protocol,
your report should also describe how do you
plan to implement each protocol.
For instance, your implementation plan should describe
things such as:
- How many threads do you need at each network node?
- What does each thread do?
- Which threads need to communicate, and how?
- Which threads need to share data, and which are these data structures?
Your report has to also include a detailed time schedule
(say weekly) that you plan to follow in implementing this project.
Keep in mind that large software projects such as this require
a significant amount of testing and debugging after all pieces
are brought together. Our suggestion is to reserve at least 20%
of the overall time-frame for testing and debugging of the entire system.
Also, you may want to introduce major implementation milestones.
For example, you can try to have the complete project running,
without the garbler, by Nov 1.
Finally, your report should show how do you plan to distribute
each implementation task among the group members. A good approach
is to have two persons being responsible for each piece of code,
even if one of them is the main programmer.
Milestones, deliverables, and grading
The project will have to adhere to the following milestones
and deadlines.
- Form 4-member groups (email instructor): Sep 10.
- Design report: Oct 1 (due in class).
- Midpoint progress report: Nov 5.
- Final code electronic submission: Dec 2, 5:00pm.
- Demos (30 min per group): Dec 5 and 6 (tentative).
The midpoint progress reports will be informal
interviews of each group with the instructors
to talk about the progress of the implementation
and about any particular problems that the group faces.
The project accounts for 40% of the overall course grade.
The grade for the project will be determined at the
end of the semester taking into account
both the design report and the resulting implementation.