Adaptive Self-Configuration Applied to Virtual Topology Reconfiguration

Advisor Professor Ashok Goel

Project Report

Joshua Jones

 

CS 7001

2002-10-22

 

 

With the advent of all optical wavelength-based switching devices, an interesting view of optical switching fabrics supporting higher-level packet routed networks has arisen.  A set of provisioned lightpaths through the switching fabric interconnects a set of edge packet routing devices.  This set of lightpaths is viewed as a virtual topology to be used by the routers.  From the perspective of the routers, the lightpaths are no different from a set of hardwired links.  Currently, due to the lack of appropriate control methods, provisioning of lightpaths is a manual operation.  The operator attempts to achieve a good configuration for some average or worst-case traffic situation, and this configuration is somewhat static.  If traffic patterns change significantly over time, or as new customers place new demands on the network, reorganization and reprovisioning is possible, but is quite slow as planning and configuration tasks must be done by hand.

 

There is a good deal of interest in automating these planning and reconfiguration tasks to make these virtual topologies truly dynamic, so that they can adapt to changing situations on a much shorter time scale.  This capability would allow more efficient and economical use of network resources as traffic patterns change.  In addition, a network that is able to dynamically reconfigure itself may be able to minimize the impact of equipment failure at some number of nodes.  An active area of research frequently referred to as 'Virtual Topology Reconfiguration' is concerned with tackling this problem.  The challenge is often viewed as consisting of two stages.  First, develop a new optimal topology given some set of data about the network, such as a recent snapshot of traffic flow between edge devices.  Next, implement the newly devised topology through a sequence of reconfiguration commands with the intent of causing the least amount of disruption during the process.  Of course, where several optimal (or nearly optimal) topologies are possible, the ease of smooth implementation may inform the selection of a new topology. 

 

There are some problems with this two stage approach.  The new virtual topology is designed with a fixed traffic pattern.  This input to the topology generation function may be based on traffic flows at some given point in time, or upon some projection of what the typical traffic pattern will be until the next reconfiguration cycle.  Because of the complexity of recomputing and the disruptive nature of reimplementing an optimal topology, this cycle time must be long.  This makes the likelihood of error in the inputs used to generate a new topology quite high, as it is difficult to accurately predict traffic conditions over long periods of time.   Another problem is the disruptiveness of total reconfiguration.  Even when carefully planned schemes are used to reconfigure a virtual topology, a significant delay (or even loss) may be imposed upon traffic entering the network during the process.

 

To deal with this problem, a reactive incremental tuning approach has been suggested [1].  This approach monitors the load on each link.  At periodic decision points, a single link may be added if links exist with loads above some high watermark.  If no such links exist, a link may be deleted if links exist with loads below some low watermark.  Currently, studies have focused on centralized processes that may recommend the modification (addition/deletion) of at most one link per decision interval.  The modification of multiple links per decision cycle and the decentralization of control in this process have not yet been studied.  In this paper, we will consider the maximum number of links modified per cycle to be an externally configured parameter, along with the high/low watermarks and the interval between decisions.  Decentralization of the control process could help to improve survivability under equipment outage or denial of service conditions.  However, redundancy could also be used to improve survivability and decentralization is outside the scope of this paper.  Because of the simplicity of this reactive, incremental approach, the interval between actions can be much lower.  The topology continuously adapts itself towards a better fit for current traffic conditions, with a minimum of disruption to existing traffic.  Some empirical results have been obtained showing that this approach generates nearly optimal topologies and minimizes the disruption of reconfiguration, under a set of studied traffic patterns [2].  As an aside, it is interesting to note that with the addition of this reactive control, the network can be seen as an immobot [3], where the reconfiguration algorithm acts as the network's regulatory system.

 

Because of its advantages, the reactive tuning approach is attractive.  However, adopting this approach also brings a distinct set of problems.  First, the externally configured parameters of the process, enumerated above, have a significant impact on the functioning of the reconfiguration algorithm.  Certain combinations of parameters need to be selected for certain policies (e.g. maximize throughput, use resources minimally, etc).  Furthermore, the optimal choices for these parameters are influenced by the characteristics of the underlying network, some of which are variable.  For instance, the rate of change of the traffic flows may influence the choice of decision interval.  Without some additional layer of control, these parameters must be chosen by hand for some expected or average network conditions, for a set policy.  These parameters are then semi-static, as any change must be effected manually.  It is desirable to make these parameters automatically adjustable so that some failures of the reactive reconfiguration system can be corrected quickly, without operator intervention.  For example, imagine that the decision interval has been set quite short to closely track smoothly changing traffic patterns.  If for some reason a particular traffic flow begins to rapidly increase and then drop off periodically, it is possible that a lightpath will be alternately added and deleted in each of decision cycle.  It is even possible that such patterns would be generated as an attack upon the tuning mechanism itself.  This path 'flapping' effect is clearly undesirable, and could be stopped by modifying the configured parameters of the reactive control appropriately, among other possible remedies.  In order to manually correct the problem in a timely fashion, the network would have to be monitored almost continuously by a human operator.  Notice that this largely defeats the purpose of using automatic network adaptation in the first place!  This is a strong reason to have at least some form of self-monitoring in place.

 

Another problem with the reactive tuning approach is that sudden, drastic changes such as multiple equipment outages or data storms may exceed the capability of the process to make sufficiently many appropriate modifications in a short period of time.  Also notice that, under such conditions, some packet loss/delay is unavoidable.  The goal in these situations should be to immediately reconfigure the virtual topology to whatever extent is required to get things under control.  This suggests a reversion to the two stage full optimization technique.  The problems with this technique diminish greatly under aberrant conditions.  Though the 'optimal' new topology that is generated may well not be truly optimal by the time it is executed, it is sure to be a major improvement over the outmoded pre-incident topology.  In addition, packet loss/delay due to massive reconfiguration is not a concern, as inaction will result in even greater manifestations of these problems.  A possible solution to this problem with the reactive tuning approach is to provide a higher level of control that is able to diagnose such a failure, intervene and apply the two-stage full reconfiguration, and then release the reactive process on the properly modified topology.

 

This paper postulates that by introducing a reflective process into the virtual topology reconfiguration agent, the problems of appropriate parameter (re)configuration and coping with potentially catastrophic circumstances can be handled effectively.  Endowing the reconfiguration agent with a model of the functioning of the network and the reactive/incremental process will allow for automatic detection and correction of failures in the reactive process.  Such a modification could make the use of the reactive approach practical under very dynamic traffic conditions.  This would allow network operators to enjoy the benefits of reactive reconfiguration without needing to closely monitor the process.  In addition, the system maintains the ability to recover quickly and automatically from serious events that comes with the full reconfiguration method.

 

Further study is necessary to determine the parameter settings that are most appropriate for a given combination of policy and traffic conditions.  As the proposal of the reactive reconfiguration approach is quite new, knowledge about the effects of modifying relevant parameters is limited.  To date, studies on the behavior of the process have been limited to situations with smoothly changing traffic patterns.  A more complete understanding must be developed before a reflective system capable of making optimal tuning modifications can be defined completely.  It may even be possible to use reinforcement learning to train the reflective process to tune the parameters of the reconfiguration agent appropriately.  Of course, this process must be completed (or significantly advanced) before deploying this system in a real-world situation. 

 

The specifics of the means of failure detection also require development.  However, some readily conceivable examples of possible techniques help to illustrate the idea.  For instance, the reflective process can periodically compute an optimal topology using the first part of the full reconfiguration method.  This topology can then be compared to the one arising from actions taken by the reactive process.  A significant enough deviation could trigger intervention.  It would also be possible to store representations of the topologies created by the reactive process.  This backlog can be examined for cycles in topology subsets with frequencies too high to be considered acceptable (e.g. the kind that might result from regular daily traffic fluctuations).  If such a pattern is detected, a parameter modification should be made to prevent the cycle.  In another example, if overall link utilization is low and there are lightpaths with loads near the high watermark, the reflective system should be able to detect this condition by examining the data obtained from the network, and consider decreasing the high watermark.  The reflective system can also monitor the number of links with loads outside the established boundaries, and consider increasing the number of modifications per decision cycle if that number grows too large.  If there are many links with loads well outside the boundaries, this may indicate a serious problem and trigger full reconfiguration. 

 

Various strategies for encoding the knowledge about the reactive system are also possible.  A model based on SBF-TMK [4] might be useful, though a mathematical model used with a rule-based production system to guide behavior seems like a likely fit.  Because the modes of operation of the reactive reconfiguration agent are continuous and potentially infinite in number, a continuous mathematical model may be necessary to select the appropriate modes.  This is in contrast to models of schema-based control, where SBF-TMK has been quite successful, but the operating modes are discrete and finite.  Note that the interruption of the reactive process under serious failure conditions is a discrete operation, and could be implemented through a SBF-TMK system without too much trouble, i.e. by making full reconfiguration and reactive reconfiguration different task instances under a prototype task for reconfiguration.  However, as there are only two options here (and the specific mode of the reactive process is not addressed), this technique may be unnecessarily complex for the indication of interruption.  The development of a detailed strategy for reflective control of the reconfiguration agent must guide the choice of knowledge encoding.

 

Above are some examples of ways to encode self-knowledge and failure detection techniques, which could be used in various combinations.  A much more complete method or set of methods will have to be developed for this system to be practical.  This, along with the discovery of appropriate ways to tune the parameters of the reactive process, forms a significant area for future research.  The application of model-based self adaptation to virtual topology reconfiguration has the potential to lead to the development of an extremely powerful network management tool.  This tool will have economic impact, insuring the efficient use of available resources.  In addition, it will help to insure the survivability of our networks under a variety of conditions, aiding defense.  There is a potential for advancement of the areas of artificial intelligence and networking through work on this topic.



References

 

[1] A. Gencata and B. Mukherjee, “Virtual-Topology 

    Adaptation for WDM Mesh Networks Under Dynamic Traffic”,

    in Proc. of IEEE INFOCOM, 2002.

 

[2] A. Gencata, L. Sahasrabuddhe and B. Mukherjee, “Virtual-Topology

    Adaptation with Minimal Lightpath Change for Dynamic Traffic in WDM

    Mesh Networks”, in Proc. of OFC, 2002.

 

[3] B. Williams and P. Nayak, “Immobile Robots: AI in the New

    Millennium”, AI Magazine, 17(3), Fall 1996.

 

[4] E. Stroulia and A. Goel, “Evaluating PSMs in evolutionary design:

    the AUTOGNOSTIC experiments”, Academic Press, 1999.

 

[5] D. Banerjee and B. Mukherjee, “Wavelength-Routed Optical Networks:

    Linear Formulation, Resource Budgeting Tradeoffs, and a

    Reconfiguration Study”, IEEE/ACM Transactions on Networking,

    8(5):598-607, 2000.