|
A Reactive Implementation of the Tau Protocol Composition Mechanism
R. Clayton, K. L. Calvert
Networking and Telecommunications Group
College of Computing
Georgia Institute of Technology
Atlanta, Georgia 30332-0280
404 894 1155, 404 894 9107
clayton@cc.gatech.edu, calvert@cc.gatech.edu
Abstract
Tau is a framework for end-to-end communication services. Tau supports the
nonlayered composition of protocol functions. A key part of the Tau framework
is a generic protocol interface, which hides from protocol functions the
details of the other protocol functions with which they are composed. How to
structure this generic protocol interface is an interesting and challenging
design question. This paper presents an overview of the design goals and
pieces of the Tau framework, and argues that a reactive programming model
offers advantages for both reasoning about and implementing the generic
protocol interface. We also describe our experience composing protocol
functions using a reactive model to implement the NETBLT bulk-data transfer
protocol.
Keywords: reactive systems, protocol framework, NETBLT, transport services
1. Introduction
Recent years have seen a flurry of activity in the higher levels of network
architecture. Applications are being developed and specified at an
unprecedented rate. Many of these applications are adequately served by TCP
and UDP, however some are not. In the present Internet architecture,
end-to-end services other than reliable byte-stream (TCP) and best-effort
datagram (UDP) are often integrated with the application, because there is no
general mechanism for composing and re-using individual functions such as
congestion control. Applications thus have to either implement these functions
themselves, use functions they don't need (i.e. TCP), or forego them
altogether.
A related problem is that of flexibility with respect to the services used by a
particular application. The canonical example is security: sometimes HTTP
needs to be run over a secure encrypted connection, while at other times this
overhead is unnecessary. In the Internet protocol architecture, port numbers
are the only mechanism available to encode the presence or absence of such
services. Thus, Web servers listen on two ports: one for secured connections
(using a specific protocol, i.e. SSL), and one for normal connections. The
only way for a client to determine that a given server does not support
security is to attempt to connect to the corresponding port. In general, this
approach causes the number of well-known port numbers to grow exponentially in
the number of services (or protocols implementing services) whose presence or
absence needs to be encoded.
To summarize: the present Internet Architecture stops at TCP and UDP,
relegating all other services and combinations of services to the application.
This situation impedes the deployment of new network applications by making it
difficult to implement, configure, and use such services unless they are
integrated into the application. What is needed is general approach to
organizing these kinds of services and enabling them to be composed in various
ways. (Note: although the OSI protocol architecture extends beyond the
Transport level, various aspects of it - including its insistence that each of
seven layers be present - preclude its consideration as a solution.)
Toolkits are one standard technique for dealing with complications of the kind
involved with provisioning network services. A toolkit is essentially a
codification of a set of solutions for a specific problem. For example, the
ISIS toolkit (Birman and van Renesse, 1994) represents a particular set of
solutions for providing group management and communication services. Toolkits,
being associated with specific problems and solutions, tend to have limited
scope. In addition, the preciseness of their assumptions and attendant
solutions tend to fragment the system into components that are difficult to
re-integrate into a higher-level system. (Here we are talking about horizontal
integration, as opposed to vertical integration. Most toolkits are made to be
vertically integrated. Horizontal integration is more difficult, principally
because of the heterogeneous nature of the components, as can be appreciated by
considering how ISIS might be integrated with Kerberos, a toolkit providing
authentication services.)
Frameworks are also an attempt to codify solutions, but at a more general level
than toolkits. An exact definition of a framework is problematic, but if a
toolkit is a software system embodying both mechanism and policy, then a
framework is a software system embodying only mechanism. The x-kernel
(Hutchinson and Peterson, 1991) is an example of a framework for implementing
protocols, as is the Conduit and associated patterns (Zweig and Johnson,
1990).
This paper describes Tau (Transport and up), a proposed
solution to the problem described above. Tau has some characteristics of a
framework, but also implements some functionality of its own (including
addressing). It is more than a toolkit, because it is intended to be fully
extensible in certain dimensions. Perhaps it is best described as a
platform, one that supports dynamic configuration of efficient
end-to-end communication services composed from reusable protocol functions. A
primary objective of Tau is to support addition of new functionality after Tau
has been implemented and deployed. To that end, Tau features a generic
protocol interface (GPI), which defines the shape of all protocol
functions. The purpose of the GPI is to hide details of how protocol functions
are composed from the protocol implementor, who is then free to focus on how to
implement the desired function (Calvert 1993).
Another design goal is for Tau to provide common-path performance comparable to
existing protocol implementations, (i.e. TCP and UDP) for comparable services.
The design of the GPI is interesting and challenging for several reasons, most
especially the tradeoffs among the design goals of flexibility, modularity and
performance.
Given a platform that supports dynamic selection and configuration of protocol
services, there is still the nontrivial problem of determining which
configuration to provide to a particular application instance, and when to
change the configuration. Solutions to this problem are outside the scope of
Tau, which assumes the existence of a configuration manager that decides
which functions to configure for each session (or even each packet), and
informs Tau of its decision. The configuration manager might be implemented in
the application itself, or it might be a separate function at the same level as
Tau. Tau's relationship with this configuration manager as well as its
position in the overall architecture is shown in Figure 1.
Figure 1. Tau context
The remainder of this paper has two main parts. In the first part we give a
general overview of the Tau platform. Then we highlight issues involved with
defining a generic protocol interface, consider various models that might be
used to reason about and implement such an interface, and argue that the model
of reactive systems offers advantages over others. The second part of
the paper (Section 3) describes our experience using a reactive approach to
protocol composition in implementing a specific protocol (NETBLT) from protocol
function components.
2. Tau Overview
The goal of the Tau project is provide an architecture for common end-to-end
services (i.e. those services above the network layer that can reasonably be
expected to be used by different applications) based on insights gained from
protocol design research over the past two decades. The Tau platform has three
main components:
- a metaheader protocol, which supports once-for-all demultiplexing
and describes the processing to be applied to each Tau data unit;
- a generic protocol interface, which enables protocol functions to
be plugged into slots in the Tau framework, and frees implementors from
worrying about details of the protocol composition mechanism;
-
a set of individual protocol functions such as flow control, connection
management, encryption, etc.
Tau is intended to support fast path common cases efficiently, while also
supporting, with a minimum of extra cost either at implementation or execution
time, generality and flexibility for those network services that require it.
Tau recognizes and supports the usual software engineering goals of modularity,
code re-use, and so on with the intention of making it possible to rapidly
deploy new services and applications. The design of Tau also takes into
account various known performance-enhancing techniques including once-for-all
demultiplexing (Feldmeier, 1990; Tennenhouse, 1989), integrated layer
processing, and application layer framing (Clark and Tennenhouse, 1990).
A common feature of many of these techniques is that they are precluded by the
use of layering as a composition mechanism. Although layering makes
sense for some interfaces in a protocol architecture (viz., at natural
multiplexing and switching points such as the top and bottom of the network
service), at other points it hides information that can be useful in planning
and carrying out the processing of a protocol message - e.g. the configuration
of protocols present in a message. For this reason, Tau does not rely on
layering to compose protocol functions; rather, it provides a backplane
structure, into which protocol functions can be inserted.
Figure 2 shows the general organization of the Tau framework and the data flow
processing of an incoming message. Each protocol function is viewed as a
passive transducer, which is given inputs (state information, header from the
incoming message, and in some cases user data) and produces outputs (updated
state information, control parameters, possibly a reply header, and processed
user data). The architectural glue is provided by the demux-and-dispatch
function, which is responsible for invoking and coordinating the protocol
functions, providing them with inputs (i.e. parsing incoming headers and
retrieving relevant state information), and collecting and passing on to the
external environment (i.e., the user, the network, and auxiliary functions such
as timeout and buffer management) their outputs. Logically, the protocol
functions operate independently and concurrently on any given message, except
for data-handling functions, which in general have to be executed in some
sequential order for correctness (e.g., sender and receiver need to agree on
the order in which to perform encryption and checksumming).
Figure 2. The Tau architecture.
2.1 Related Work
A number of configurable protocol frameworks have been proposed and-or
developed, including O'Malley and Peterson's Dynamic Architecture (O'Malley and
Peterson, 1992), FCSS (Zitterbart et al., 1993), DaCaPo (Plagemann et al.,
1993), HOPS (Haas, 1991), and ADAPTIVE (Schmidt and Suda, 1993). In addition,
various researchers have proposed protocols that offer flexible functionality
for particular purposes (LaPorta and Schwartz, 1993; Feldmeier, 1993). Tau has
in common with these the idea that an application should be able to select from
a menu the protocol functions it needs. It differs in being explicitly
designed to support almost arbitrary extension of the set of services it can
provide, in its avoidance of layering as a composition mechanism, and in its
aim of providing a standard platform (which might be realized in many different
ways) rather than an environment that must be present on both ends.
The work of Bhatti and Schlichting (1995) on creating composite protocols in
the x-kernel is probably closest to the philosophy of Tau. Like their
approach, Tau features a generic protocol function interface; it differs,
however, in emphasizing the development of a platform - a protocol for bits on
the wire and in its approach to composition.
2.2 Metaheader Protocol
The metaheader protocol defines the bits on the wire needed for
interoperability among Tau implementations. It supports five functions
provided by Tau itself:
- Application addressing.
- Multiplexing and fast session state retrieval (the session identifier used
to locate relevant state for an incoming message is chosen by the message
recipient).
- Error detection over Tau protocol control information (i.e. a checksum
over everything in the Tau message that precedes user data).
- Reply advice, which tells the receiving Tau whether to send reply
messages immediately or wait for an application response.
- Configuration information describing the headers present in the message,
and indicating what processing to apply to the message.
For the purposes of this paper, the last function is the primary one of
interest. It is this mechanism that allows for dynamic configuration of
different protocol functions for each application, and for the addition of new
protocol functions to the Tau framework.
Two types of configuration description are supported. The first type is the
most general: the Tau metaheader is followed by a list of header
descriptors, one for each protocol function that needs to be applied by the
recipient to the message. Each header descriptor contains the protocol
identifier as well as the size and location of the corresponding protocol
header (if any). This information enables the receiving Tau to locate and pull
out the actual headers from the message, and determine which protocol functions
to invoke. This type of configuration description can support essentially
arbitrary combinations of protocol functions.
The second type of configuration description assigns concise identifiers to
common protocol header configurations, and omits the header descriptors to
reduce overhead. This well-known configuration number can be used to index
into a table of parse-and-dispatch routines, each of which pulls out the
headers present and invokes the appropriate functions.
2.3 Generic Protocol Interface (GPI)
The objective of the Generic Protocol Interface is to "carefully separate
aspects of protocols that are externally visible (i.e. affect headers and data)
from those that deal with `infrastructure'" (Calvert, 1993) - in other words,
to maximize the separation of functions from the glue that binds them together.
In most architectures that feature some form of general protocol interface, the
protocol implementations interact directly with each other. To maximize
flexibility, these interfaces typically impose few constraints on the way
protocols interact with each other. In practice, this leads to hidden
assumptions (e.g., about responsibilities for buffer de-allocation) that make
it difficult to re-use implementations in different configurations.
In Tau, the idea is that protocol functions interact only with the Tau
framework, which provides them with all information (i.e. headers and state)
needed for processing. The advantages of this approach include:
- Assumptions about protocol functions in the environment are minimized.
- No ordering is imposed on protocol processing, unlike with layering. This
removes one obstacle to parallel implementations, although parallelism at this
level has not yet been shown to be effective. (Note that ordering can be
imposed and declared in the metaheader where it is necessary. For example, the
order of application of functions that modify user data must be known at the
receiver.)
- Information about common functions can be exploited to provide fast path
processing.
Ideally in this model, each protocol function is logically independent, and
does not need to interact with any others. In practice, different protocol
functions sometimes need to interact; Tau provides ways for them to do that
indirectly.
For example, a protocol function responsible for sending acknowledgments of
incoming data units needs one bit of information from the checksum function,
namely whether the data unit was received correctly. This is an example of a
more general problem that does not arise with layered composition: some
protocol functions are show-stoppers, i.e. they cause processing on a data unit
to be aborted or delayed. In a layered architecture, the next protocol in the
sequence never sees the data unit. In the logically parallel framework of Tau
however, other functions may already have processed the data by the time the
show-stopper is discovered. For example, a sequencing function may determine
that an incoming data unit is not in the receive window, and therefore should
not be delivered to the user. (We consider copy-out to the user's data space a
separate protocol function.)
To solve this and other problems, Tau will segregate these show-stopper
functions, and invoke them before any other processing. If any show-stopper
raises an exception that would cause a message to be aborted, any completed
processing must be rolled back, and other functions may not be invoked. (Note
that some functions might be invoked even when a received data unit is
erroneous. For example, flow control window information for the opposite
direction of data travel would be useful even if a checksum indicates the user
data is corrupted, as long as the header itself is not corrupted. This is the
reason Tau has a separate checksum for header and metaheader information.)
Another problem caused by horizontal composition is more difficult to solve:
functions that cause data to come to rest in one way or another inside Tau are
not conducive to this form of composition. The canonical example of such a
function is fragmentation and reassembly, which clearly cannot be logically
independent of functions such as checksumming and sequencing. Tau's solution
is simply to proscribe composition of such functions with others in the same
layer, and to permit Tau to be used recursively, i.e. layered on top of
itself. Obviously there is a performance penalty for this form of composition,
but the point is that those applications that really need fragmentation and
reassembly can get it.
Conceptually, the Generic Protocol Interface regards an individual protocol
function as a transducer that maps a tuple (state, header, [user
data]) to another tuple (new state, [reply header], [user
data]). Each protocol function provides such a mapping (i.e. a method)
for each type of processing: incoming message, outgoing message, new session,
shutdown, state rollback. Some protocols may have methods for other types of
processing, e.g. timeout. Finally, the Generic Protocol Interface also
requires other information about a protocol function: whether it is a
show-stopper or whether it may raise exceptions of various kinds. In
particular, Tau must treat some types of functions, such as flow control and
session management, as special cases, because they affect the processing of Tau
itself (e.g. Tau must check with flow and congestion control before it begins
invoking functions to construct an outgoing message).
2.4 Implementing Tau
This section discusses implementation, first by pointing out important features
of Tau the implementation must provide, then by considering possible software
models for the Tau implementation, and finally by showing how the chosen
software model was used to implement a prototype Tau composition mechanism.
2.4.1 Important Tau Features
Before examining the various software models that are applicable to a
Tau implementation, it might be a good idea to review the Tau
architecture shown in Figure 2 with an eye toward picking out the
essential features that the software model would be required to
support.
Figure 2 has two prominent features: the set of protocol functions making up a
protocol, and their arrangement along bus-like control and data paths. An
immediate consequence of these features is the need for a generic interface to
mediate the contact between the protocol functions and the buses, insuring that
protocol functions can be swapped in and out of a protocol's implementation,
perhaps dynamically during execution. If this interface is properly designed,
it should also simplify the process of incorporating new functions into the Tau
framework as they are developed.
Another consequence of the Tau architecture is determining how control
moves among the protocol functions. Although exploiting available
parallelism is an important objective, assuming sequential execution
still raises the question of determining the order in which control
passes among the protocol functions. Some of these issues were noted
above (e.g. is decompression done before computing the checksum);
another is: How is exceptional control
provided, as would be necessary when the checksum computation fails,
for example?
Even though Figure 2 presents Tau's control and data-manipulation
features, the work described in this paper is focused on Tau's control
features. Previous work, oriented around data streams and data stream
computations, has dealt with data-manipulation protocols (Clayton
and Calvert, 1995). Future work will integrate the mechanisms dealing
with Tau's control and data-manipulation features into a single
system.
2.4.2 Possible Software Models
The objective in implementing Tau is to realize the architecture
in a way that minimizes the cost, both at protocol development and
execution time, of using the implementation. An important first step
in this design is choosing the proper software model to guide the
design and implementation of Tau. This section considers several
possible software models and justifies our choice of a reactive
software model as the basis for a Tau implementation.
Object-oriented models. One cannot go too far in the process
of designing software these days without having one's attention drawn
to object-oriented models (Bapat, 1994). At first glance, it seems
that object-oriented models are ideal for Tau; for example, one way to
insure interoperability of protocol functions is to have them inherit
from the same abstract class. However, our experiences using the
object-oriented model to implement protocols, along with taking a
second, closer glance, indicates that perhaps object-oriented models
were not the anodyne they initially seemed to be.
It is true that interoperability can be insured by having all
protocol functions inherit from the same abstract class. However,
such interoperability is syntactic, obtained by forcing all protocol
functions to presenting the same interface. It is not clear to us,
however, that the protocol function for establishing a connection via
a three-way handshake should have the same interface as a protocol
function providing DES encryption.
Our experiences with the x-kernel (Hutchinson and Peterson, 1991) -
which is not, strictly speaking, an object-oriented system, but which is close
enough in intent for our purposes - bear out our concerns. During our attempt
at implementing the AppleTalk protocol family, we found that the
x-kernel's uniform protocol interface, analogous to an interface
inherited from an abstract class, occasionally came up short with respect to
the control and data that needed to be passed between protocol components,
requiring us to distort the implicit semantics of the uniform protocol
interface, or subvert it entirely through back doors. If this sort of thing
happens when implementing a family of related protocols, it is not to hard to
imagine how much worse it could get when implementing a set of largely
unrelated and arbitrary protocols.
But even if the technical difficulties of syntactic conformance and the like
could be solved, there are larger issues at work too. Tau's control-flow model
appears to be considerably looser than that found in most object-oriented
models, although we must admit that we find murky the issues around parallel
control flow in object-oriented models. Control in Tau cuts across protocol
functions, independently of what those protocol functions might be; this is the
principle reason for calling Tau's architecture "bus-like".
Such cross-cutting behavior causes difficulties in an object-oriented approach
because it doesn't provide much structuring help during object-oriented design,
and it doesn't respect any object design that eventually evolve (Kiczales et
al., 1997). Such problems were recognized early on in object-oriented design,
and dealt with using such paradigms as Model-View-Control in Smalltalk and,
more recently, various types of design patterns (Vlissides et al., 1996).
However, along with this extra machinery comes a performance hit, a problem
which is also being worked on (Schmidt et al., 1995).
It should be noted that we are not saying that object-oriented models
cannot be used to implement system software or protocols; we are just
saying that, for Tau, object-oriented models are perhaps not the best
choice. Note also that we are not denying the usefulness of
object-oriented techniques; in fact, a previous incarnation of Tau was
implemented in the object-oriented style, and future ones may be also.
Functional models. Functional languages have many desirable
properties (Hughes, 1989), and modern functional-programming
environments provide powerful facilities for structuring software
systems (Tofte et al., 1996). The objective of using functional
programming to implement systems software has been a long standing
one, with perhaps the must fully realized instance of that objective
being The Fox Project (Cooper et al., 1991). Although we also
consider functional programming an attractive way to design and
implement network protocols, we do not have the resources to devote to
establishing an effective functional infrastructure.
Event-driven models. Event-driven models have had a long
history of structuring real-time systems, operating systems, and, more
recently, graphical user interfaces; event-driven systems have also
been applied to structuring protocols (Bhatti and Schlicting, 1995).
Figure 2 suggest Tau would also be a good candidate to be implemented
as an event-driven system.
There are, however, a few sticking points about event-driven models.
First, the event-based communication is usually asymmetric. In
particular, events move from the environment to the program, but
rarely in the other direction. For example, in the X Windows System
events are restricted to traveling from the run-time system to the
application; communication in the other direction is done by library
calls in the X Window System.
Second, this communication asymmetry may extend into the program
itself. If two parts of a program would like to communicate, they
generally can't use events to do so, they must rely on some other
mechanism. This introduces a dichotomy in the available
communications mechanisms, a worrisome prospect for a system like Tau
because of the independence with which protocol functions are created
and used.
To be most useful as a software model for implementing Tau,
event-driven systems should be extended to cover these two short
comings: event communication between the computation and the
environment should be symmetric, and events should be available for
intra-computation communication. Bhatti and Schlicting (1995) provide the
second extension but not the first (largely because their system
operates within the x-kernel, which does not use events
internally).
Reactive models. The closest execution model to event-driven
systems are reactive systems (Montague and McDowell, 1997). A
reactive system is one that carries an on-going dialog with the
environment in which it is executing. An operating system is the
prototypical reactive system, as is a real-time control program if you
ignore the timing issues. In addition to reacting with its
environment, a reactive system tends to be long-lived with no clear
stopping point (apart from an explicit stopping signal from its
environment).
Despite their similarities, reactive models extend event-driven
systems in several significant ways. First, events in reactive
systems are fully symmetric; they can travel from the environment to
the reactive system and vice versa. In addition, events can be used
between components of the same reactive program. The reactive model
provides a single, universal, and quite powerful communication
mechanism, while event-driven systems must rely on two or more
asymmetric communications mechanisms.
A second significant extension made to the event-driven model by
reactive systems is the altered interpretations placed on events by
reactive systems. Event-driven systems treat events both more
generally and more specifically than do reactive systems. Generality
shows up in the asynchrony with which events can arrive and influence
an event-driven system. Because of such generality, event-driven
systems have to be written to handle arbitrary events that may arrive
at arbitrary times. Specificity shows up in the priorities and
deadlines associated with events in an event-driven system. Because
of such specificity, event-driven systems have to be written to order
their computations to minimize conflicts between the priorities and
deadlines of the associated events; for example it may be that the
next event impinging on an event-driven system requires that the
system handle the event within 10 microseconds of receiving the event,
and do so in preference to any other computations it may be carrying
out at the time the event arrives. It is dealing asynchronously with
priorities and deadlines that contributes the bulk of the difficulty
when designing and implementing event-driven systems. (Note that this
paragraph applies more to hard event-driven systems such as real-time
control systems; to the extent that an event-driven system can take a
more relaxed approach to asynchrony, priority, and deadlines - as is
done by GUIs, for example - it begins to resemble a reactive
system).
Reactive systems generalize the specific event interpretations of event-driven
systems by eliminating or limiting event priority and deadline. This
generalization reduces the number of event types and the urgency with which
they must be dealt. Reactive systems make more specific the general
asynchronous event arrivals through the synchrony hypothesis: all
output events occur synchronously with the associated input events, and input
events don't interfere with each other; alternative statements of the synchrony
hypothesis are "computations triggered by events take no time" and
"computations are atomic with respect to events" (Benveniste and Berry, 1991).
The synchrony hypothesis greatly tames the difficulty associated with handling
asynchronous events; although what event may show up next is unknown, when it
will show up can be determined and planned for.
A third extension made to the event-driven model by reactive systems -
less significant then the other two because it is fairly easy to see
how it can be added to the event-driven model - is the use of
co-routines as control construct (here we are referring more to the
imperative reactive models used in such languages as Reactive C and
Signal).
A reactive construct, once it begins execution, may stop in one of two ways: it
can terminate or it can suspend; the choice between the two depends on the code
appearing in the construct. When a reactive construct terminates, control
leaves the construct and passes on to the next construct (reactive or
otherwise) to continue execution.
When a reactive component suspends, control leaves the construct, but can, at
some later time, re-enter the construct at the point of suspension and continue
execution within the construct. Suspension and resumption may occur repeatedly
until the construct terminates. A reactive component, once terminated, cannot
be resumed, but it may be reset, which puts the component in the state it had
before it was ever executed. When executing a resumed component, control
always begins at the component's start; in contrast, when executing a suspended
construct, control begins from the point of last suspension, which may be
from anywhere within the construct.
This facility provides coroutine-like behavior ("coroutine-like" because
suspension doesn't direct control to any particular place, but just causes it
to leave the current construct), which is convenient for computations that need
to stop and wait because it eliminates the need to save volatile local state
in more stable, longer lived storage. A similar facility can be provided by
having a dedicated thread execute within a procedure, as can be done in the
x-kernel, but reactive suspension is neater because it
eliminates the need for thread synchronization using, for example, semaphores.
The execution of a set of reactive components is organized into a sequence of
non-overlapping intervals. During each interval, all non-terminated
components are given a chance to execute, either by having control enter them
in normal sequential fashion or by being resumed. The interval ends when all
components stop, either by terminating or suspending; at that point the
interval ends (thus, if a component fails to terminate during an interval, the
interval never ends).
It is at the start of an interval that the environment interacts with
a reactive system. The start of the next interval is marked by the
occurrence of a signal from the environment; non-terminated components
may ignore or react to the signal as directed by the code they
contain.
During execution, a component may issue its own signals; these signals are
available to both the environment and the other reactive components. The
semantics of synchronous reactive systems require that any signals issued by
reactive components during an interval must appear as if they were issued at
the start of the interval; it is the job of the run-time system, along with the
semantics of the reactive programming language, to insure this illusion
obtains.
2.4.3 Implementing Tau as a Reactive System
The description of the protocol composition mechanism
given in this section is broken into two
parts. The first part presents the high-level, system-oriented implementation
details, and the second part presents the lower-level, programmer-oriented
details.
Figure 3 shows the general outline of the implementation. Signals
from the environment, along with any associated data, are captured by
the Interrupt Handler, which hangs the signals on the Interval Queue
(an interval is reactive-system jargon for the period of
computation beginning with a signal and lasting as long as
computations are executing in response to the signal). The Scheduler
dequeues signals and pairs them with protocol functions waiting for
the signals. For each pair found, the Scheduler starts executing the
protocol function, passing the signal data, if any. Pairing continues
until all waiting protocol functions have been examined, at which
point the current interval ends and the Scheduler goes back to the
Interval Queue for the next signal.
Figure 3. Structure of the protocol composition mechanism.
The Interrupt Handler is configured to recognize packets arriving from
the network and host-system timer expiry as signals. In the case of
packets, the associated data contains a pointer to the packet itself;
timer signals have no associated data. Adding new signals is a matter
of reconfiguring the Interrupt handler to recognize the signals in the
form provided by the host system. The implementation does not
currently allow protocol-function defined signals, but it is clear
from Figure 3 how they would be added: a protocol function would hang
a signal identifier and associated data on the Interval Queue.
The set of waiting protocol functions can be manipulated by protocol
functions, which can add and remove members from the set.
One point not made clear by Figure 3 is the implementation's location
within the host system. Currently, it resides in application space,
and, in fact, is contained entirely in the Tau library; an application
using Tau shares its process space with the Tau implementation. This
arrangement is not ideal because Tau uses resources - interrupt
handlers, for example - that may be needed by the application, but it
is simpler and more convenient to deal with than the alternatives,
such as a separate application- or kernel-space implementation.
With the high-level boxes and arrows of the implementation laid out, we
can now go into some of those boxes to examine lower-level details.
The C typedef for the generic protocol interface is:
typedef void gpi(void * evnt, void * args);
evnt is an input parameter pointing to a data structure giving
more information about the event that triggered the protocol function's
execution. In the case of messages, evnt points to the message,
its length, and the address of the message sender; evnt is zero
for timer expires because there is no extra information.
The input parameter args is a pointer uninterpreted by
the composition framework; its
interpretation and use is under control of the protocol functions. Generally,
a parent protocol function uses the area pointed to by args to
pass initial values to a child protocol function; a child protocol function
uses the area to store values across invocations.
This version of the generic protocol interface is indeed generic, but
a bit too simple. For example, the absence of any meaningful
semantics for args makes it difficult to determine when
its being misused, either intentionally or unintentionally; in
addition, it is difficult to determine a protocol function's operation
from the information presented by its generic protocol interface.
These difficulties are serious, but are acceptable at this stage of
development because we are interested in developing a prototype to
determine reactive systems' overall fitness for implementing Tau.
Any C function using gpi as its prototype can serve as a protocol
function. Protocol functions are manipulated by the following routines:
mhandler_id schedule_mhandler(unsigned short msg_list, gpi pf, void * args)
void cancel_mhandler(mhandler_id)
schedule_mhandler() hangs the protocol function pf on
the wait queue; pf will be removed from the wait queue and set to
running when the scheduler encounters a message of the type indicated by the
bit vector msg_list. schedule_mhandler() returns an
identifier that can be used in cancel_mhandler() to remove the
protocol function from the wait queue; once a protocol function is running,
calling cancel_mhandler() has no effect. A similar set of
routines exists for timer events.
As an example of how these routines are used, the following code periodically
sends a message a fixed number of times:
static void send_open(void * iap, void * ap) {
sconnect_argsp scap = (sconnect_argsp) ap;
if (scap->retries_left == 0) {
scap->done = true;
return;
}
send_pkt(scap->nbid, scap->r_host, scap->r_port,
(u1b *) scap->omp, msg_length(scap->omp));
scap->retries_left--;
scap->tid = schedule_timer(scap->response_timeout*1000, send_open, ap);
}
Because send_open() is triggered timer events, the event pointer
iap is zero. The argument pointer ap points to a data
structure containing the retransmission count, the timer interval length, and
information about the message to send. If there are retransmissions left,
send_open() retransmits the message and reschedules itself for the
next retransmission. send_open() continues to execute until it
runs out of retransmission attempts, at which point it indicates it is done, or
it is canceled by some other procedure calling
cancel_timer(scap->tid).
The implementation is roughly 2 000 lines of ANSI C running under Solaris
5.5.1, SunOS 4.1.3, and Linux; code can be found at http://www.cc.gatech.edu/computing/Networking/playground/dsa/nbc.html,
where it is coupled with the code for the NETBLT implementation described in
the next section.
3. An Example Tau Protocol
To illustrate the points made in the previous sections, we describe an
implementation of the NETBLT protocol within the reactive Tau framework. The
NETBLT protocol provides a connection-oriented, bulk-data transfer service
optimized for high bandwidth-delay environments (Clark et al., 1987). The two
principal features of NETBLT are large buffer sizes and rate-based flow
control. Large buffers exploit large memory systems and have the advantage of
amortizing transmission overhead over larger transfer units. Rate-based flow
control exploits the loose coupling between the peers in bulk-data transfer and
has the advantage of lower overhead than found in other forms of flow control,
such as sliding windows.
There are a number of other NETBLT features of interest, particularly to
implementors. Briefly, each buffer is fragmented into packets and sent at the
expected rate. At the end of a buffer transmission, the receiver sends a
request to retransmit any lost or corrupted packets. To further reduce the
transmission overhead, several buffers may be in transit at the same time.
NETBLT makes heavy use of timers to detect lost packets, lost peers, and other
error conditions. NETBLT has a number of parameters - such as buffer size,
packets size, packet transmission rate, and so on - the values of which are
under application control and negotiated between the ends of a connection; the
burst rate may is also re-negotiated after each buffer transmission.
The overall NETBLT implementation is factored into four parts:
- Endpoint management - creating and deleting NETBLT endpoints.
- Connection management - establishing and tearing down connections.
- Buffer transfer - sending and receiving data.
- Housekeeping - packet send and receive, timer management, and so on.
The first three parts implement the C interfaces that the implementation
presents to NETBLT clients: create() and delete()
from the endpoint-management code, open() and close()
from the connection-management code, and send() and
receive() from the buffer-transfer code; the housekeeping code
exports interfaces used only from within the NETBLT implementation.
Housekeeping involves all the low-level support required by NETBLT: message
send and receive, including header completion and checksum computation and
validation, intra-NETBLT storage management, and the like. The bus-like
aspects of the Tau architecture, along with their realization in a reactive
system, served us well here, allowing us to construct protocol functions for
these basic activities without particular regard to how and by whom they would
eventually be used.
As the system grew, however, we found the number of these basic protocol
functions began to get out of hand; determining what function was performed
where and how became difficult. In response, we began to combine protocol
functions into larger, more specific but still logically cohesive functions;
this is why header completion is combined with message sending, for example.
To a certain extent, these problems were brought about by Tau's architecture,
with its single, non-hierarchical bus, and its implementation with global,
non-scoped events. However, for the most part, this problem is an instance of
the common problem of indexing and discovery in system promoting reuse, the
general solutions for which involve aspects outside the scope of this paper.
It is in the buffer transfer part that the value of a reactive implementation
is clearest. To review briefly: client-to-client data transfer is done in
terms of buffers, the size of which is negotiated between the clients. Within
NETBLT, each buffer is transferred in packets, the size of which is also
negotiated by the clients. Packets are sent at a rate negotiated by the
clients; lost and corrupted packet recovery is done by selective retransmission
triggered by a timeout after the expected buffer transfer time. To further
amortize transmission costs, clients may have several buffers in transit at the
same time; the number of buffers in transit is negotiated between the clients.
The open communication patterns provided by events in reactive systems are
invaluable in untangling the complex interactions among buffer, packets, and
timers in NETBLT. Timer-based code can snoop on message events, setting and
resetting themselves according to their purpose, without direct intervention
from other parts of the protocol.
The reactive style was a great help in implementing multi-buffer buffer
transfers. (The NETBLT Request for Comments (Clark et al., 1987) seems to be
contradictory on the subject of packet order during multi-buffer transfers, so
what follows is based on our interpretation of what the RFC says, which may not
be what it actually says). In multi-buffer transfers, each buffer is sent
sequentially, one immediately after the other. In addition to cutting the
overhead for inter-buffer coordination between the peers, this allows packet
retransmission for previous buffers to be overlapped with packet transmission
of the current buffer, which helps keep the pipeline between peers filled.
Replicating the buffer-send protocol function isn't enough to implement
multi-buffer transfers because of the need to coordinate packet transmission
between the protocol functions, and the need for each protocol function to
handle its own packet retransmission.
Connection management too benefits from a reactive style of implementation,
although not as dramatically as does buffer management, largely because
connection management isn't part of the more complex, commonly executed path
through NETBLT. Never the less, the ability to tease out the components of
connection management and assign them to separate protocol functions helps
simplify and clarify the issues surrounding connection management. For
example, the receiving end of a connection, assuming its implemented as a
server, executes a cycle of accepting a single connection request and then
rejecting all following connection requests until the accepted connection
terminates.
A non-reactive language like C makes it easy to implement such functional
separations too. For example, a procedure pointer variable could point
alternately to a procedure that accepts connection requests and a procedure
that rejects them. The questions then become, however, what entity is
responsible for flipping the values of the procedure pointer variable?, and
what entity is responsible for calling the procedure pointed to?
The reactive implementation of NETBLT answers these questions quite nicely:
each protocol function is made responsible for determining how it reacts to a
connection request. The protocol functions do this by snooping on events to
determine the state of the transfer. If a transfer is in progress, the accept
protocol function is inactive and the reject protocol function is active; if
there's no transfer in progress, their roles are reversed. This solution,
based on local inference and unobtrusive snooping, is much more difficult to
provide for in a language such as C.
The endpoint management part of the NETBLT implementation is the least
interesting with respect to a reactive implementation. Endpoint management is
responsible for creating and managing local instances of a NETBLT endpoint.
Because these endpoints are local, the management code has little need to call
on the reactively implemented resources - which are largely concerned with
providing peer-to-peer services - available deeper in the NETBLT
implementation. Instead, endpoint management deals mostly with NETBLT clients
and the host system, which are assumed (but not required) to be implemented in
a conventional, non-reactive programming language.
This description of the NETBLT implementation has necessarily been sketchy;
those interested in the complete details should consult the NETBLT code at http://www.cc.gatech.edu/computing/Networking/playground/dsa/nbc.html,
where the reactive implementation of Tau can also be found.
4. Discussion
To provide some indication of the cost of implementing protocols using this
composition mechanism, we now compare two simple echo servers, one implemented
in UDP only and the other implemented in Tau over UDP.
The UDP-only implementation is the simplest possible. The server waits on a
socket read; any datagram read gets immediately echoed back to the sender and
the server loops to the read wait again. The client is essentially the same,
with the reads and writes reversed, and all within a for loop which iterates
n times.
The Tau implementation is as simple as the UDP-only implementation, but with
reactive control. The server uses a echo protocol function that reacts to an
incoming message, echoing it back to the sender. The client implementation
uses the server's echo protocol function to continue the interaction after the
first send; after the nth send, the echo protocol function is replaced
by a receiver protocol function to read the last echo.
The number of interest is the amount of time it takes the client to echo a
k-byte buffer using the echo server. This number is derived by
averaging the measurements made by echoing the buffer n times. The
client and server were running on the same UltraSparc 1 under Solaris 5.5.1;
messages were sent through loop-back.
Table 1 shows the numbers collected with a 1400-byte buffer averaged over 10
000 runs. Not surprisingly, using Tau is twice as expensive as not using it.
We are not overly concerned about these numbers because, at this early stage of
our work, our interests are focused on other, more immediate issues such as
suitable architectural features for Tau and effective implementation
strategies. When we have advanced enough in our work we can turn our attention
to performance issues. Nor are we able to discern from these numbers an omen
that we've chosen a disastrously wrong path to implementing Tau. We have just
included these numbers as a convenience to those who must see performance
numbers when reading about protocols.
| | UDP client | Tau client |
| UDP server | 280 | 420 |
| Tau server | 450 | 570 |
Table 1. Performance overhead imposed by the reactive composition
mechanism (units are microseconds).
Of more importance to us at the moment is the conceptual efficiency afforded by
the Tau architecture when designing and implementing protocols. Although
conceptual efficiency is harder to qualify and quantify than is execution
efficiency, our experiences up to this point with the current implementation
have been positive.
A previous realization of the same service was based on an explicit process and
pipe composition mechanism; each protocol function is a thread, and protocol
functions communicate by sending messages along uni-directional pipes. Using
processes and pipes to design protocols was reasonably simple, but using them
to implement protocols was considerably more difficult.
A majority of the difficulties arose from the unconstrained asynchrony implicit
in having ten to fifteen threads executing per NETBLT endpoint. With so much
asynchronous activity, it becomes difficult to determine the overall state of
the system at any particular time during execution. Normally, it shouldn't be
necessary to determine overall state, but when the system isn't working, or
isn't working as well as expected, it becomes important. As another example,
one related to the normal execution of the system, terminating threads in an
ordinary fashion turned out to be amazingly difficult. Globally killing them
outright is one possibility, but it makes orderly resource recovery difficult.
In contrast, the reactive implementation also provides threads, but in a form
more constrained then that described in the previous paragraphs. In
particular, parallelism triggered by events combines the advantages of
concurrent execution with global control, making it easier to test assertions
about the state of the system.
4.1 Further Work
The next-generation system we are designing and building is based on a version
of the synchronous programming language SL (Boussinot and de Simone, 1995).
With respect to Tau, the most interesting feature of S, the language serving as
the basis of the next-generation system, is the use of signals to implement the
Generic Protocol Interface.
In S, as in SL, protocol developers may declare their own signals, as in
proc send_msg(msg : vector unsigned byte, msg_size : unsigned int)
signal ready(unsigned int, vector unsigned byte)
// and so on...
In this example, ready is a signal with two parameters, the first
accepting an unsigned integer and the second accepting a reference to a vector
of unsigned bytes. These parameters are used to transfer values from the
entity that raised the signal to the entities waiting for the signal.
The other interesting trait of S signals, which is also true of signals in SL,
is that they are scoped with respect to the lexical constructs in which they
are defined; in this respect signal declarations behave as do variable
declarations in block-structured languages. In the previous example, lexical
scoping means the ready signal is available only within the send_msg procedure.
When raised, the ready signal only propagates within the body of the send_msg
procedure; entities outside send_msg cannot react to
ready.
In addition to developing and refining the reactive control semantics in the
next-generation system, we plan to determine how to best introduce into the
system our data-stream subsystem. The combination of both control and data
functions will give us a complete, reactive implementation of the Tau
architecture.
5. Conclusion
The demand for upper-level network services has spurred research into
techniques supporting the design and implementation of protocols at the
transport layer and up. Tau is being developed to integrate these various
techniques into a flexible and efficient framework for protocol development.
Reactive software offers an attractive model for implementing the Tau
framework. In addition to providing an execution environment close to that
found in most protocol subsystems, reactive software offers a number of
desirable properties not found in closely related software models, such as
event-driven systems; these properties include full symmetry in communication
among components, better integration with other software components, and a
firmer theoretical model.
This paper has described an initial reactive implementation of the Tau
framework, and shown how it can be used to implement the NETBLT bulk-transfer
protocol. Preliminary implementation experiences indicate that the reactive
model can be an effective and convenient way of implementing protocol software,
although some performance cost is involved.
Bibliography
Bapat, S. (1994) Object-Oriented Networks: Models for Architecture,
Operations, and Management. Prentis-Hall, Englewood Cliffs, New Jersey.
Benveniste, A. and Berry, G. (1991) The Synchronous Approach to Reactive and
Real-Time Systems. Proceedings of the IEEE, 79(9), 1270-1282.
Bhatti, N. and Schlicting, R. (1995) A System for Constructing Configurable
High-Level Protocols. Computer Communications Review (SIGCOMM '95
Proceedings), 25(4), 138-150.
Birman, K. and van Renesse, R. (1993) Reliable Distributed Computing with
the Isis Toolkit, IEEE Computer Society Press, Los Alamitos, California.
Boussinot, F. and de Simone, R. (1995) The SL Synchronous Language, INRIA
Research Report 2510, Sophia-Antipolis, France.
Calvert, K. (1993) Beyond Layering: Modularity Considerations for
Protocol Architectures, in Proceedings of the IEEE International
Conference on Network Protocols, October, San Francisco, USA.
Clark, D., Lambert, M., and Zhang, L. (1987) NETBLT: a bulk data transfer
protocol. Request for Comments 998, Network Working Group, Internet
Engineering Task Force.
Clark, D. and Tennenhouse, D. (1990) Architectural Considerations for a New
Generation of Protocols, in Proceedings of ACM SIGCOMM 90, September,
Philadelphia, Pennsylvania, 200-208.
Clayton, R. and Calvert, K. (1995) Structuring Protocols with Data
Streams, in
Proceedings of the Second International Workshop on High Performance
Protocol Architectures, December, Sydney, Australia.
Cooper, E., Harper, R. and Lee, P. (1991) The Fox Project: Advanced Development
of Systems Software, School of Computer Science Technical Report CMU-CS-91-178,
Carnegie-Mellon University, Pittsburgh, Pennsylvania, August.
Feldmeier, D. (1990) Multiplexing Issues in Communication System Design, in
Proceedings of ACM SIGCOMM 90, September, Philadelphia,
Pennsylvania, 209-219.
Feldmeier, D. (1993) An Overview of the tp++ Transport Protocol
Project, in High Performance Networks - Frontiers and Experience, Kluwer
Academic Publishers, Boston, 157-176.
Floyd, S., Jacobson, V., and McCanne, S. (1995) A Reliable Multicast
Framework for Light-Weight Sessions and Application Level Framing, in
Proceedings of ACM SIGCOMM 95, October, Cambridge,
Massachusetts, 342-356.
Haas, Z. (1991) A Protocol Structure for High-Speed Communication over
Broadband ISDN, IEEE Network 5(1), 64-70.
Hughes, J. (1989) Why Functional Programming Matters. The Computer
Journal, 32(2).
Hutchinson, N. and Peterson, L. (1991) The x-Kernel: An Architecture
for Implementing Network Protocols. Transactions on Software
Engineering, 17(1), 64-75.
Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C., Lopes, C., Loingtier,
J.-M., and Irwin, J. (1997) Aspect-Oriented Programming, Xerox PARC Technical
Report SPL97-008 P9710042, Palo Alto, California, February.
LaPorta, T. and Schwartz, M. (1993) The MultiStream Protocol: A Highly Flexible
High-Speed Transport Protocol, IEEE Journal on Selected Areas in
Communications, 11(4), 519-530.
Montague, B. and McDowell, C. (1997) Synchronous/Reactive Programming of
Concurrent System Software. Software Practice and Experience,
27(3), 207-243.
O'Malley, S. and Peterson, L. (1992) A Dynamic Network Architecture.
ACM Transactions on Computer Systems, 10(2), 110-143.
Plagemann, T., Plattner, B., Vogt, M., and Walter, T. (1993) Modules as
Building Blocks for Protocol Configuration, in Proceedings of the
International Conference on Network Protocols, October, San Francisco,
106-113.
Schmidt, D., Harrison, T., and Al-Shaer, E. (1995) Object-Oriented Programming
for High-Speed Network Programming in Proceedings of the USENIX Conference
on Object-Oriented Technologies, June, Monterey, California.
Schmidt, D., and Suda, T. (1993) Transport System Architecture Services for
High-Performance Communications Systems, IEEE Journal on Selected Areas in
Communications 11(4), 489-506.
Tennenhouse, D. (1989) Layered Multiplexing Considered Harmful, in
Proceedings of the IFIP Workshop on Protocols for High-Speed Networks,
May, 143-148.
Tofte, M., Launchbury, J., Meijer, E., and Sheard, T. (1996) Essentials of
Standard ML Modules, in Tutorial Text to the Second International School in
Advanced Functional Programming, August, Olympia, Washington, 208-238.
Vlissides, J., Coplien, J., and Kerth, N. (1996) Pattern Languages of Program
Design 2. Addison-Wesley: Reading Mass.
Zitterbart, M., Stiller, B. and Tantawy, A. (1993) A Model for Flexible
High-Performance Communication Subsystems, IEEE Journal on Selected Areas in
Communications 11(4), 507-518.
Zweig, J. and Johnson, R. (1990) The Conduit: A communication abstraction in
C++, in Proceedings of the USENIX C++ Conference, April, San Francisco,
California, 191-204.
About the authors
R. Clayton is a Ph.D student in the Networking and Telecommunications group in
the College of Computing at Georgia Tech.
Kenneth L. Calvert is an Assistant Professor in the College of
Computing at the Georgia Institute of Technology. He received the
Ph.D. degree in 1991 from the University of Texas at Austin. From
1979 to 1984 he was a Member of Technical Staff at Bell Telephone
Laboratories in Holmdel, New Jersey. Dr. Calvert's current research
interests include active networks, development of an end-to-end
protocol framework for the Internet, information infrastructure
(middleware) for the home environment, and topological models of the
Internet.
This work was supported in part by the U.S. National Science Foundation under
grant NCR-9612855.
|