Georgia Tech College of Computing

Mocha: Fault-tolerant Object Sharing in Java

The growth in popularity of the World Wide Web (WWW) has resulted in the development of a new generation of tools tailored to Internet computing activities. These Web spinoffs are having a profound impact on the field of distributed computing and technology is progressing such that wide area computing networks are now becoming a popular target environment for research in distributed computing.

Wide area computing environments have several salient differences from traditional local area network computing environments. For example, there is less autonomy and control over resources (e.g. workstations) as many of the resources are remotely located and controlled by others. Furthermore, failures in a wide area computing environment are a relatively common occurrence. Failures, when detected by timeout mechanisms, result from network congestion and varying workloads at nodes. Also, the autonomy of nodes can result in a remote node reboot or the owner of such a node can can choose to terminate foreign tasks.

There are many problems that need to be solved to enable wide area applications to become common. One important issue that we address is a fault tolerant shared object model for wide area applications. Because these applications are themselves distributed applications, they are dependent on some mechanism for sharing state across workstations to allow processors to cooperatively work together. The shared memory and shared object models are attractive for state sharing because they are simpler to program than standard message passing. Our focus is on providing efficient state sharing that is integrated with fault tolerance support.

Our shared object model for wide area distributed applications has been implemented as part of the Mocha wide area computing infrastructure we are currently developing. The Mocha system is written entirely in Java, and provides basic facilities for metacomputing such as remote evaluation support, security, debugging support, and a highly scalable thread-safe network communication library. The contributions of our work in this area include the following:

Support for shared objects on heterogeneous platforms: To improve performance and mitigate latency, copies of objects can be created and accessed locally. Object sharing support utilizes advanced distributed shared memory techniques for maintaining consistency of shared objects. All Java objects which are serializable may be shared by the system.

Integrated Fault tolerance: Fault tolerance support has been integrated into the state sharing support in a manner that allows its overhead to be controlled based on the degree of fault tolerance needed by an application.

Protocol Support: A highly scalable, thread-safe, object oriented network communication library that provides support for fine grain network operation activities has been developed. The system's need for such a custom network communication library is motivated/illustrated through it's use for lock acquisition and other fine grain message activities related to maintaining shared state consistency. Additionally, we have designed of a ``composite protocol'' approach which combines the capabilities of Mocha's network library and TCP to support the transfer of large replicas in a high performance fashion that preserves system scalability.

Empirical evaluation of the system: The performance of the system has been evaluated in both local and wide area networks. We have also begun evaluating the system in a more accurate home service environment, namely, a Windows 95 PC connected via a cable modem to a Unix Workstation.

Applications: Several home service and metacomputing applications are being developed to investigate the capabilities and limitations of our wide area computing environment.

A sample service to the home application that we have developed with the state sharing support provided by Mocha is the formal dining table setting coordinator. This application is intended for a consumer located in the home who wishes to add a new formal dinner table place setting composed of flatware, plates, and glassware. At the consumer's home, a graphical user interface is executing which allows various flatware, plates, and glassware to be viewed together so that the consumer may ``mix and match'' these items and end up with a pleasing coordinated table setting. Additionally, a sales associate located at the retail outlet may also have a copy of the graphical user interface which permits the associate to see what the customer is selecting and may suggest alternatives which are then presented in the customer's GUI. Furthermore, the home consumer may have requested friends located at other homes to also participate in this decision making and therefore they too may be running a GUI and viewing the possibilities and also making suggestions. In this scenario, it is expected that the platforms on which the GUI executes in each of the homes would be vastly different from the platform at the retail outlet.

In our implementation of this application, the GUI shown above is sent via Mocha's remote evaluation support to execute at several remote sites. Each site may modify the flatware, plates, or glassware currently being displayed by pushing the appropriate previous or next button. These buttons result in activating callbacks which modify shared objects associated with each item. A thread which executes in each remote GUI periodically polls the shared objects for new values and updates the local display as needed. The shared objects rely on the system's shared state support for consistency maintenance.

Brad Topol
College of Computing
Georgia Institute of Technology