Table of Contents Previous Chapter

Architecture & Performance

Edward A. Fox & Tomasz Imielinski

As the Internet community rapidly grows, usage of WWW continues to increase, and more multimedia information becomes available, it is essential that innovative research and development projects focus on the resulting architectural and performance problems that will threaten the viability of the Web. Already we see: servers swamped when some information they provide is suddenly "discovered"; long delays or timeouts when accessing popular Web pages; intolerably slow transmission of images, audio and video files; frequent error messages indicating a link has been "broken" (e.g, by moving the target file); and wasteful re-transmission of large files from remote sites, even when requested almost immediately after a prior access is complete.

Some of the existing problems can be easily solved through training, especially if organizations' "Webmasters" have proper guidelines. Thus, making use of proxy servers can reduce re-transmission and traffic through caching, video files can be split into a number of smaller segments to reduce startup delays, images can be compressed using the JPEG standard to cut down on transmission time, and systems with link databases (e.g., Hyper-G [Kappe 94] & WWW based [Pitkow 95]) can ensure consistency of link and node sets. Clearly, education and training is needed.

Recommendation 1: Improvement Techniques for Web Performance

We recommend research, development and training efforts to help discover and validate techniques to improve Web performance, e.g., determination of optimal configurations of servers and proxy servers, along with settings of parameters (e.g., cache size, garbage collection strategies) on both client and server systems, for organization-wide use of WWW technology; prediction of performance that takes into account the very skewed access distributions commonly associated with hypertext collections; preparation of authoring guidelines regarding in-line inclusion, sizes of pages, segmentation of multimedia streams, and summaries or lower resolution versions of large objects; and development of courseware and other training materials to disseminate these results (see also Recommendation 4 below).

Problems like: broken links; inadequate semantics and searchability of links; and impossibility of attaching source anchors to read-only items can be resolved if link databases become part of the basic infrastructure of the WWW, instead of a subparts of it (e.g., the collection of Hyper-G systems).

Recommendation 2: Comparative Analysis of Link Specification and Maintenance

We recommend research and comparative studies regarding the current WWW scheme for specifying links, vs. alternative schemes such as having a distributed link database. These investigations relate to the fundamental architecture of the WWW, its performance, its support of searching methods (e.g., over link attributes and metadata), and user interface functions like browsing the node-link graph.

While some Internet services use broadcast methods (e.g., listserv), the WWW adopts a global hypermedia model where objects are delivered on demand (after supplying some universal identifier). For real-time video distribution over the Internet (e.g, MBONE), multicast methods are applied. Hence, we must consider: What changes will be needed as WWW usage shifts more to the use of multimedia information, where objects are very large, are constituted of streams that require synchronization, and in some cases must satisfy real-time delivery demands in order to provide adequate quality of service? How will other Internet services change to be integrated into the Web as it evolves to become the "memory" for the networks?

Recommendation 3: Comparative Analysis of Multimedia Routing Protocols

Therefore, we recommend research and comparative studies regarding WWW and its connection with: multicast; real-time information delivery; on-demand services; large multimedia objects and streams requiring synchronization; listservs and other types of asynchronous or synchronous conferences.

These investigations will help ensure that the WWW evolves to better support the growing demands for computer tele-conferences and other synchronous communications, as well as more common electronic mail, computer conference, bulletin board, listserv, and USENET services - all enhanced to include multimedia information. This line of research will be essential to determine: what user demands and requirements will be for such services; what new workloads are likely to arise; and how various architectures, algorithms and protocols influence performance and quality of service.

All of the above mentioned problems take on new dimensions as the WWW scales up, in terms of amount of information, size of information items, number of authors, number of computers, number of users, and amount of usage by each person. It is this continuing pressure on existing infrastructure that must be anticipated so that research studies will yield new ways to improve the WWW architecture and its performance. Some should deal with networking concerns, others with algorithms, and yet others with database systems, data structures, file systems, information systems, or knowledge bases. Other studies should consider whether and where indexes are kept, what analyses are pre-computed vs. prepared by knowbots, what search heuristics are developed and deployed, and where replication schemes are applied.

Recommendation 4: Modeling and Simulation of the WWW

The recommendations above (especially Recommendation 1) indicate the need for research regarding modeling and simulation of the WWW, including developing workloads (based on logs and traffic measurements), describing architectures, exploring the implications of information organization, and comparing approaches to information processing and management. Devices like caching, prefetching, clustering, classification, and filtering should be considered to determine their effectiveness in reducing traffic and improving service.

When these results are obtained and then refined through empirical studies, it will be feasible to begin to address more complex problems. In particular, as the WWW transitions from a data to an information and thence to a knowledge network, with ever more powerful systems providing desired services, it will be necessary to consider the interaction of agents and other intelligent modules. In-depth content analysis (e.g., using natural language or image processing techniques), summarization, planning, constraint satisfaction, translation and conversion, and other highly complex tasks will need to be supported on a grand scale.

All of these investigations must occur in the environment of an open network, with standards evolving as required regarding: representation, interchange, protocols, and interfaces. Interoperability is the key, and the WWW provides an environment inherited from the Internet where rapid prototyping and large-scale testing of small inventions is encouraged.

References

Kappe F., Andrews K., Faschingbauer J., Gaisbauer M., Pichler M., & Schipflinger J. (1994) Hyper-G: A Network Tool for Distributed Hypermedia. (on-line documentation) <URL:ftp://iicm.tu-graz.ac.at/pub/Hyper-G/doc/>.

Pitkow J., & Jones R. K. (1995) Towards an Intelligent Publishing Environment. Computer Networks and ISDN Systems, 27, North-Holland (in press).

Table of Contents Next Chapter