Adaptive Self-Configuration Applied to Virtual
Topology Reconfiguration
Advisor Professor Ashok Goel
Project Report
Joshua Jones
CS 7001
2002-10-22
With
the advent of all optical wavelength-based switching devices, an interesting
view of optical switching fabrics supporting higher-level packet routed
networks has arisen. A set of
provisioned lightpaths through the switching fabric interconnects a set of edge
packet routing devices. This set of
lightpaths is viewed as a virtual topology to be used by the routers. From the perspective of the routers, the
lightpaths are no different from a set of hardwired links. Currently, due to the lack of appropriate
control methods, provisioning of lightpaths is a manual operation. The operator attempts to achieve a good
configuration for some average or worst-case traffic situation, and this
configuration is somewhat static. If traffic patterns change significantly over
time, or as new customers place new demands on the network, reorganization and
reprovisioning is possible, but is quite slow as planning and configuration
tasks must be done by hand.
There
is a good deal of interest in automating these planning and reconfiguration
tasks to make these virtual topologies truly dynamic, so that they can adapt to
changing situations on a much shorter time scale. This capability would allow more efficient
and economical use of network resources as traffic patterns change. In addition, a network that is able to
dynamically reconfigure itself may be able to minimize the impact of equipment
failure at some number of nodes. An
active area of research frequently referred to as 'Virtual Topology
Reconfiguration' is concerned with tackling this problem. The challenge is often viewed as consisting
of two stages. First, develop a new
optimal topology given some set of data about the network, such as a recent
snapshot of traffic flow between edge devices.
Next, implement the newly devised topology through a sequence of
reconfiguration commands with the intent of causing the least amount of
disruption during the process. Of
course, where several optimal (or nearly optimal) topologies are possible, the
ease of smooth implementation may inform the selection of a new topology.
There
are some problems with this two stage approach.
The new virtual topology is designed with a fixed traffic pattern. This input to the topology generation
function may be based on traffic flows at some given point in time, or upon
some projection of what the typical traffic pattern will be until the next
reconfiguration cycle. Because of the
complexity of recomputing and the disruptive nature of reimplementing an
optimal topology, this cycle time must be long.
This makes the likelihood of error in the inputs used to generate a new
topology quite high, as it is difficult to accurately predict traffic
conditions over long periods of time.
Another problem is the disruptiveness of total reconfiguration. Even when carefully planned schemes are used
to reconfigure a virtual topology, a significant delay (or even loss) may be
imposed upon traffic entering the network during the process.
To
deal with this problem, a reactive incremental tuning approach has been
suggested [1]. This approach monitors
the load on each link. At periodic
decision points, a single link may be added if links exist with loads above
some high watermark. If no such links
exist, a link may be deleted if links exist with loads below some low
watermark. Currently, studies have
focused on centralized processes that may recommend the modification
(addition/deletion) of at most one link per decision interval. The modification of multiple links per
decision cycle and the decentralization of control in this process have not yet
been studied. In this paper, we will
consider the maximum number of links modified per cycle to be an externally
configured parameter, along with the high/low watermarks and the interval
between decisions. Decentralization of
the control process could help to improve survivability under equipment outage
or denial of service conditions.
However, redundancy could also be used to improve survivability and
decentralization is outside the scope of this paper. Because of the simplicity of this reactive,
incremental approach, the interval between actions can be much lower. The topology continuously adapts itself
towards a better fit for current traffic conditions, with a minimum of
disruption to existing traffic. Some
empirical results have been obtained showing that this approach generates
nearly optimal topologies and minimizes the disruption of reconfiguration,
under a set of studied traffic patterns [2].
As an aside, it is interesting to note that with the addition of this
reactive control, the network can be seen as an immobot
[3], where the reconfiguration algorithm acts as the network's regulatory
system.
Because
of its advantages, the reactive tuning approach is attractive. However, adopting this approach also brings a
distinct set of problems. First, the
externally configured parameters of the process, enumerated above, have a
significant impact on the functioning of the reconfiguration algorithm. Certain combinations of parameters need to be
selected for certain policies (e.g. maximize throughput, use resources minimally,
etc). Furthermore, the optimal choices
for these parameters are influenced by the characteristics of the underlying
network, some of which are variable. For
instance, the rate of change of the traffic flows may influence the choice of
decision interval. Without some
additional layer of control, these parameters must be chosen by hand for some
expected or average network conditions, for a set policy. These parameters are then semi-static, as any
change must be effected manually. It is
desirable to make these parameters automatically adjustable so that some
failures of the reactive reconfiguration system can be corrected quickly,
without operator intervention. For
example, imagine that the decision interval has been set quite short to closely
track smoothly changing traffic patterns.
If for some reason a particular traffic flow begins to rapidly increase
and then drop off periodically, it is possible that a lightpath will be
alternately added and deleted in each of decision cycle. It is even possible that such patterns would
be generated as an attack upon the tuning mechanism itself. This path 'flapping' effect is clearly
undesirable, and could be stopped by modifying the configured parameters of the
reactive control appropriately, among other possible remedies. In order to manually correct the problem in a
timely fashion, the network would have to be monitored almost continuously by a
human operator. Notice that this largely
defeats the purpose of using automatic network adaptation in the first
place! This is a strong reason to have
at least some form of self-monitoring in place.
Another
problem with the reactive tuning approach is that sudden, drastic changes such
as multiple equipment outages or data storms may exceed the capability of the
process to make sufficiently many appropriate modifications in a short period
of time. Also notice that, under such
conditions, some packet loss/delay is unavoidable. The goal in these situations should be to
immediately reconfigure the virtual topology to whatever extent is required to
get things under control. This suggests
a reversion to the two stage full optimization technique. The problems with this technique diminish
greatly under aberrant conditions.
Though the 'optimal' new topology that is generated may well not be
truly optimal by the time it is executed, it is sure to be a major improvement
over the outmoded pre-incident topology.
In addition, packet loss/delay due to massive reconfiguration is not a
concern, as inaction will result in even greater manifestations of these
problems. A possible solution to this
problem with the reactive tuning approach is to provide a higher level of
control that is able to diagnose such a failure, intervene and apply the
two-stage full reconfiguration, and then release the reactive process on the properly
modified topology.
This
paper postulates that by introducing a reflective process into the virtual
topology reconfiguration agent, the problems of appropriate parameter
(re)configuration and coping with potentially catastrophic circumstances can be
handled effectively. Endowing the
reconfiguration agent with a model of the functioning of the network and the
reactive/incremental process will allow for automatic detection and correction
of failures in the reactive process.
Such a modification could make the use of the reactive approach
practical under very dynamic traffic conditions. This would allow network operators to enjoy
the benefits of reactive reconfiguration without needing to closely monitor the
process. In addition, the system
maintains the ability to recover quickly and automatically from serious events
that comes with the full reconfiguration method.
Further
study is necessary to determine the parameter settings that are most
appropriate for a given combination of policy and traffic conditions. As the proposal of the reactive
reconfiguration approach is quite new, knowledge about the effects of modifying
relevant parameters is limited. To date,
studies on the behavior of the process have been limited to situations with
smoothly changing traffic patterns. A
more complete understanding must be developed before a reflective system
capable of making optimal tuning modifications can be defined completely. It may even be possible to use reinforcement
learning to train the reflective process to tune the parameters of the
reconfiguration agent appropriately. Of
course, this process must be completed (or significantly advanced) before
deploying this system in a real-world situation.
The
specifics of the means of failure detection also require development. However, some readily conceivable examples of
possible techniques help to illustrate the idea. For instance, the reflective process can
periodically compute an optimal topology using the first part of the full
reconfiguration method. This topology
can then be compared to the one arising from actions taken by the reactive
process. A significant enough deviation
could trigger intervention. It would
also be possible to store representations of the topologies created by the
reactive process. This backlog can be
examined for cycles in topology subsets with frequencies too high to be
considered acceptable (e.g. the kind that might result from regular daily
traffic fluctuations). If such a pattern
is detected, a parameter modification should be made to prevent the cycle. In another example, if overall link
utilization is low and there are lightpaths with loads near the high watermark,
the reflective system should be able to detect this condition by examining the
data obtained from the network, and consider decreasing the high
watermark. The reflective system can
also monitor the number of links with loads outside the established boundaries,
and consider increasing the number of modifications per decision cycle if that
number grows too large. If there are
many links with loads well outside
the boundaries, this may indicate a serious problem and trigger full
reconfiguration.
Various
strategies for encoding the knowledge about the reactive system are also
possible. A model based on SBF-TMK [4]
might be useful, though a mathematical model used with a rule-based production
system to guide behavior seems like a likely fit. Because the modes of operation of the
reactive reconfiguration agent are continuous and potentially infinite in
number, a continuous mathematical model may be necessary to select the
appropriate modes. This is in contrast
to models of schema-based control, where SBF-TMK has been quite successful, but
the operating modes are discrete and finite.
Note that the interruption of the reactive process under serious failure
conditions is a discrete operation, and could be implemented through a SBF-TMK
system without too much trouble, i.e. by making full reconfiguration and
reactive reconfiguration different task instances under a prototype task for
reconfiguration. However, as there are
only two options here (and the specific mode of the reactive process is not
addressed), this technique may be unnecessarily complex for the indication of
interruption. The development of a
detailed strategy for reflective control of the reconfiguration agent must
guide the choice of knowledge encoding.
Above
are some examples of ways to encode self-knowledge and failure detection
techniques, which could be used in various combinations. A much more complete method or set of methods
will have to be developed for this system to be practical. This, along with the discovery of appropriate
ways to tune the parameters of the reactive process, forms a significant area
for future research. The application of
model-based self adaptation to virtual topology reconfiguration has the
potential to lead to the development of an extremely powerful network
management tool. This tool will have
economic impact, insuring the efficient use of available resources. In addition, it will help to insure the
survivability of our networks under a variety of conditions, aiding
defense. There is a potential for
advancement of the areas of artificial intelligence and networking through work
on this topic.

References
[1]
A. Gencata and B. Mukherjee,
“Virtual-Topology
Adaptation for WDM Mesh Networks Under
Dynamic Traffic”,
in Proc. of IEEE
INFOCOM, 2002.
[2]
A. Gencata, L. Sahasrabuddhe
and B. Mukherjee, “Virtual-Topology
Adaptation with Minimal Lightpath Change
for Dynamic Traffic in WDM
Mesh Networks”, in Proc.
of OFC, 2002.
[3]
B. Williams and P. Nayak, “Immobile Robots: AI in the
New
Millennium”, AI Magazine, 17(3), Fall 1996.
[4]
E. Stroulia and A. Goel,
“Evaluating PSMs in evolutionary design:
the AUTOGNOSTIC
experiments”, Academic Press, 1999.
[5]
D. Banerjee and B. Mukherjee,
“Wavelength-Routed Optical Networks:
Linear Formulation, Resource Budgeting
Tradeoffs, and a
Reconfiguration Study”, IEEE/ACM
Transactions on Networking,
8(5):598-607, 2000.