Table of Contents Previous Chapter

Table of Contents

  • Robotics and Machine Intelligence - Howard Moraff

  • Science along the Information Highway: Roads to the Future -Robert C. Berwick

  • Use of WWW as a Collaborative Platform - Christopher Connolly

  • WWW, Services, and Browsers: Issues for Today and Tomorrow - Oscar N. Garcia

  • Interactive Systems - John M. Carroll

  • Summary of Recommendations - Jim Foley

  • Interface Agents on the WWW - Su-Shing Chen

  • A DL/EP/HT/IR/MM Perspective - Edward A. Fox

  • Intelligent Information Services - Tom Mitchell

  • WWW is for Knowledge and Information - John D. Hestenes

  • Scalability and the WWW - V. B. Balayoghan, Don Fussell, & Avi Silberschatz

  • Database and Expert Systems - Tomasz Imielinski & B.R.Badrinath

  • Heterogeneous Data Integration on the WWW - V.S. Subrahmanian

    Robotics and Machine Intelligence

    Howard Moraff
    Program Director
    Robotics and Machine Intelligence
    National Science Foundation
    Arlington, VA 22230
    hmoraff@nsf.gov

    We at the NSF have traditionally been hard-pressed to provide adequate access to information about the grants we make, let alone the broader activities in the fields of science that our programs support. The Worldwide Web, and the electronic future of information that it represents, present us with a significant opportunity to deliver on our obligation to inform the public - all our publics - about such research activities.

    Beyond the passive process of creating access to grant abstracts, program descriptions and announcements, and workshop reports, I believe that there is an important opportunity to create useful interactions among researchers with common or intersecting interests, and to establish a more active model of information management. So, for example, there are already discussions and explorations of such things as electronic interactions about papers, putting research data sets on-line for comparative benchmarking, putting computer programs on-line for comparisons of experimental approaches, and even using the networks to link laboratories together to enhance sharablility of resources.

    I believe we should start with the obvious, and enhance what we already do or should be doing, but that the real payoffs of the new medium will be achieved by designing new modes of access and interaction.


    Science along the Information Highway: Roads to the Future

    Robert C. Berwick
    Co-Director, MIT Center for Biological and Computational Learning
    MIT Dept. of EECS, Dept. of Brain and Cognitive Sciences
    Cambridge, MA 02139
    berwick@ai.mit.edu

    THE BASICS

    To be sure, NSF has already led the way with NSFNET. But like Mark Twain's old quip, Boston and Washington may be connected by a telegraph, but will they have anything to say to each other? In this brief I will argue that in fact, no surprise, Twain's message underscores *communication* as the basic underpinning of knowledge interchange. So, NSF's resources should be committed to this goal. One of the by-products of its basic research ought to be research into knowledge information models---language and communicative systems---that underpin scientific interchange as well as careful engineering to bring this information to as many as possible (meaning as democratic as possible). The old truism, of course: Signal gets lost in the noise. If information is not focussed, then the increase in the network load will decrease productivity. (and if the users and the information isn't modelled, then the info will not be focussed).

    There are currently an estimated 40 million people with access to the world's Internet. As computers become more widely available and used, we can expect up to two orders of magnitude increase in the number of people with access to the world network. Mail traffic to any individual user could be kept low by creating limited and separate networks, limiting the number of people per mailing list (thus limiting broadcast authority), or else developing more sophisticated message routing methods. The message routing alternative is both desirable and technically feasible. Current methods are all keyword based (either text or image-grounded), and insufficient (and in fact have not advanced much beyond the 60s and vector-matching methods, just done faster).

    CONCLUSION 1

    We need a model of message recipients that enables greater information focus. Note that *very* crude model-driven systems have evolved naturally: Usenet, mailing lists, special-interest BBoards, Archie (keyword driven FTP site search which is text retrieval based on interest; note that subject lines provide very simple representations of text content).

    THE CURRENT WEB IS A MESS TO WADE THROUGH

    The reason is that there's no semantics. Finding where to go is hard, even with backpointers. We can do better than vector-based keyword searching. Booleans and keywords don't wash. They don't even dry. It's just that the IR community isn't generally in contact with the advanced natural language processing community, as much as it should be. Then too, system design needs to be thought through: there is not enough error recovery, and so forth. How many times does one have to get a ``page not accessible" message? For a much more robust implementation, see the White House email system.

    THE POWER OF IMAGES IS SUBSTANTIAL AND REAL

    However, while pictures always make for better presentations, but they aren't the ultimate answer. My reaction to the MIT demo of live edge detection over the net is that it's sort of like watching a dog walk on its hind legs: amazing that it can do it at all. My conclusion is that Ascii text will remain prominent for some time, because that's what scientists do---they write.

    CONCLUSION 2 - BUILD ON THE SCIENTIFIC COLLABORATION MODEL

    Electronic mail discourse tends to inflame the basic differences between discussants (see Kiesler, et al. 1984). This may be positive in the scientific arena. The ``active pages" notion is another good one. NSF should take a model domain, say, molecular biology, where the clear bottleneck for people is getting to the results and papers and abstracts. (This observation from Eric Lander). There are two parts to this, one in training, one in active science:

    Part 1 - Learning collaboratively:

    Finding the right person to talk to is an important part of professional academic training. People who learn to do it well become good collaborators, and this leads to better science. People that do not---well, we know how the tale leads there. Professional academics are expected to know the names and brief professional biographies of roughly 50--500 other academics in their fields. Current academics serve as a distributed knowledge base for query by future academics, Needless to say, this could serve as an important training ground for under-represented groups in society. Adding an academic-style knowledge base (names, biographies, abstracts) to the Internet will ideally increase the educational level. It will be important to also build in the limiting factors that protect some of society's valuable human resources. The secretarial function that asks ``who is calling" can be automated. The analysis of question content for querent's knowledge level can not be automated. We can, however, provide a facility for posing quizzes or tasks to those who would communicate with us in certain capacities. For example, if I query the network for an expert in gene expression, I should receive a lattice hierarchy of addresses. That object should query me further to find out which particular aspects of gene expression I wish to understand. The result of each query should be another hierarchy object. Thus, if the last object reached is an individual, then that individual might request that the querente answer some basic question. Failed queries should send the question to the lowest rank member of the current hierarchy. An individual query failure will automatically go to (whatever) a random member of the next inferior level of the closure.

    Part 2

    The genome structure/function bottleneck would be a good case study. There's a tremendous information explosion, and nobody knows what to do with the results of the genome project, not really, because there are too many abstracts. So we need a good abstract digester, still. I've still seen nothing that really does the right job. It demands more attention to language processing than what we have been doing. Even a simple method like building K-D classification trees would help.

    CONCLUSION 3

    We will want a piece of text to index the right video conference, or piece of digitized voice to get the right text, and so on. Ideally, the key should ultimately be video as well. Lookup of this kind will not require content analysis. An effective system will convert input modality into search modality, and then search that database. In other words, there will be separate but cross-referenced image, voice, and text databases. The search and cross-referencing should be such that a piece of text and an image will find the right piece of video conference faster than will either alone. Prototype systems should be begun now with text and audio alone, as the image description/production technology is not yet available. The best current text search technology will need to be ported to audio and image search.

    ELECTRONIC MAILING LISTS ARE AN OBSOLETE CONCEPT.

    Electronic Mail is a broadcast medium like television and radio, but currently unlike them, and like the telephone, in that every receiving device is also a broadcasting device. Every member of the mailing list able to broadcast to all the others. Mailing lists typically form and dissolve rapidly, because the message traffic becomes too large for its members to absorb. Mailing lists that do not dissolve pay an overhead of mail traffic devoted to schooling the mailing list users in the proper use of mailing lists. The problem of mailing list management will become increasingly important as the use of electronic mail continues to grow explosively. Burying the mailing-list concept will involve: first, storing the messages passed along from senders in an efficient way so that the text they quote from each other is bi-directionally linked. Second, allow a WAIS query interface to that text tree so that conversations can be found and perused efficiently. Third, allow conversations (formerly mailing lists) to be spawned and ended dynamically so that conversations become compact entities. Fourth, use resumes and citation indexes to build another tree of expertise so that appropriate experts can be informed of relevant conversations. Finally, this structure can be built into the INTERNET so that its message traffic will contribute to a distributed model of its own message traffic. That model will allow content-addressed email to be sent.

    THE INTERNET SHOULD RUN FUNCTION-ADRESSED JOBS

    To get there we will first turn the tasks that the server queues into a recursive data structure. Second, we will write the task queue trees for a variety of useful tasks that we wish to perform on machines other than the server---a help lesson and the email interface to WAIS query interactions are two current examples. Thus, a computer that is able to interpret a task tree structure will be accessible to, in principle, arbitrary requests for services by the central server. Third, we make the task trees interact with the expertise trees, so that a node in a task tree that needs a particular solution can walk the expertise tree in search of a node that can provide it. Fourth, we enable some computationally complete language on the basic tasks to build and execute new task trees. Finally, we will need the best available natural language parsers to turn free text into exressions in the task language. Given some protocol for adding new basic tasks, this system will allow English requests for services to be turned into appropriate actions. Doing this automatically goes beyond current NSF/ARPA research directions (at least in their current form).

    Acknowledgments:

    Contributions and writing from John C. Mallery, Marney Smythe, Federico Girosi


    Use of WWW as a Collaborative Platform

    Christopher Connolly
    University of Massachusetts
    connolly@psyche.mit.edu

    Bibliography databases:

    Many sites are already using some form of bibliography database entry/retrieval system through Mosaic forms. This is usually for the purpose of generating citation lists for publications. However, it is also a useful way of accessing the relevant literature on a given topic.

    It would be of general use, both to NSF fundees and the public, to organize a coherent NSF bibliography database, or databases, based on bibliographies which already exist (and are presumably growing). This would be especially useful in allowing NSF funded groups to share the results of their literature searches. Provisions would be necessary for ensuring proper translation between various formats for bibliography files (e.g., bibtex, refer, etc.)

    On-line papers:

    This is a thorny issue. UMass, along with many other sites, has chosen to make locally authored papers accessible through anonymous FTP and Mosaic, probably to the chagrin of publishers. We believe strongly that this, or some similar mechanism, is a useful way of disseminating our work, and should be considered as another way of using WWW to enhance collaboration and dissemination of information. WWW does offer greater flexibility and interaction than a postscript document, so in many respects it seems preferable to convert on-line postscript papers into HTML hypertext. This might also allay publishers concerns.

    Since a LaTeX to HTML translator already exists (written by Nikos Drakos at the University of Leeds), translation of LaTeX publications to HTML is fairly easy in principle. Publications which have been translated to HTML are broken down by this translator into a hierarchy of sub-pages, indexed by a table of contents. In addition, the ability to place images (including color) in-line allows for potentially better illustration quality (especially if your printer is out of toner!). It may be ultimately preferable to generate HTML "preprints" first, from which paper publications can later be generated. A similar idea is already being tried in the field of Neuroscience, where on-line electronic <a href="http://salk.edu/NeuroWeb/posters/virtpost.html">"virtual posters"</a> can be used to disseminate research results in advance of their exhibition at neuroscience meetings. It should be noted too that the LaTeX to HTML translator may still have some bugs to be ironed out.

    Interactive demonstrations:

    NSF should encourage interactive demonstrations of algorithms which have been developed through funded programs. WWW and X windows offer a convenient way of accessing and using such programs. This mechanism is illustrated at <a href="http://tns-www.lcs.mit.edu/vs/demos.html"> MIT's Telemedia, Networks, and Systems group</a> (this is temporarily down for a conference), and the Laboratory for Perceptual Robotics <a href="http://piglet.cs.umass.edu:4321/cgi-bin/demo-setup">demo page</a> at UMass. Access to such programs can be restricted (in case they are not felt to be iron-clad) to some subset of the NSF community, or left unrestricted for public access. This is potentially a good way of demonstrating the results of NSF research to the public, and could serve as a platform for collaboration within NSF.

    Enhancements to WWW:

    Some enhancements to WWW (http/html) that seem to be useful are the addition of line graphics, and general mouse input (e.g. mouse position at the time of a button click). These additions would probably be fairly easy to make, and could make the aforementioned demo capability less painful and possibly more secure. I suspect these capabilities are already under consideration.

    Growing the Research Community:

    Young PIs-to-be sometimes have a difficult time matching their research directions with NSF programs. A central WWW-accessible mechanism for matching investigators with programs would be very useful. The most obvious way to do this is by maintaining a list of programs, fellowships, etc. which can be browsed. This could be augmented with a forms-based interface for searching for research interests that do not fall under listed categories. Such a mechanism might even be useful for soliciting pre-proposals or suggestions.

    Subject directories are a useful way of obtaining more information about a field, and the kinds of research being undertaken at various institutions. An example of such a directory is the <a href="http://piglet.cs.umass.edu:4321/robotics.html">Robotics Internet Resources Page</a> at UMass. NSF could be instrumental in establishing other subject directories, and collecting lists of such directories through its home pages. This would be very useful for people who are starting out in a field.

    Guidelines for NSF pages:

    PIs should provide at least summaries of NSF funded research in their laboratories. Guidelines for these summaries should be fairly loose and content-based, so as not to conflict with the existing WWW structure at a site (since each site generally has its own style). Funded projects should construct pages describing the project goals, research results to date, with anchors to personnel and equipment pages. NSF, in turn, should be able to point to these project summary pages. At the NSF level, projects could be listed by institution, PI, and at the very least, by subject category (e.g., by NSF directorate - along the lines already set up on www.nsf.gov). NSF should also provide forms-based searching to get access to the PIs' summary pages by PI name, institution and subject. Each project page should have back pointers both to the local institution or laboratory and to NSF. While this latter point may seem obvious, the practice of back-pointing is often not observed, and can leave the user at a dead end.


    WWW, Services, and Browsers: Issues for Today and Tomorrow

    Oscar N. Garcia
    Director
    Interactive Systems Program
    National Science Foundation
    Arlington, VA 22230
    ogarcia@nsf.gov

    Perhaps the best approach to this broad topic is to divide it into short term approaches and longer term objectives, with an organizational component in mind.

    On the short term there is little question that the current state-of-the-art allows accessibility in a form and quality - if properly organized - not available before and of which we should take advantage. It is also clear to me that it is necessary to judiciously monitor and establish the quality of the content of any repository, and that this can not be a job just added to someone or some group on a part-time or ad-hoc basis. So the issues are of responsibility and also of content and form in the short term. However, there is also the need to think about the most effective manner in which, whatever is done today and in the near term, can be utilized in the next generation(s) of WWW services. In other words, a plan for change and improvements. The availability of MOSAIC, NETSCAPE, and others seems but the prelude to an avalanche of browsers and related services whose popularity, needs, and expenses we must anticipate. The questions here are: how are we going to ask people to participate both as users and as providers of information, who should do it, at what point do we standardize or change standards, who decides, what should be included, what rejected, how should it be organized, how do we differentiate and decide between experimental level services (with proper warnings) and established delivery services. I visualize a collaborative user/provider group, assisting in this effort on an ongoing basis. Some groups are already functional or in the making, but the question of "what objectives the group has" needs to be established. In our case the objective is clear: to provide access, search, navigation, and any other services on funded research and activities sponsored by the NSF.

    On the long term the issues are even more interesting and vital: the WWW is both a delivery and interchange mechanism, and also an experimental platform for the future global information infrastructure. How can NSF tap this resource in the future and encourage today its scientific development in concert with the research community? This is a richer domain which has been addressed well in other position papers. The part that may have not been addressed, and often catches us by surprise because of unanticipated developments, is how to transition and not have to throw away all the previous database and work done before, that is, to optimize the previous investment. I'd like to see that issue addressed. Perhaps even the question of "quality" of content via guidelines or metrics. There is much vacuous info in many home pages that I look at. What are the methodologies for effective construction and efficient human intellectual consumption in this new medium? How do we experiment with this tool with the explicit knowledge (required warning?) of the subjects? What are the models of HCI that may be appropriate for this environment and how could they help both in efficiency and in design, in capture, organization, representation, navigation and submission of works? Multi-person and multi-machine operation are unavoidable extensions, some of them already in process. My emphasis on integrating speech and natural language discourse and dialogue as "natural" interfaces extends to the WWW and its browsers. The sociology and economics of "CAI on the Web" and what it would do to classic education - both good and bad - need to be considered and can not be left to chance.

    Possibly a hierarchy and taxonomy of issues could be constructed, including such specific topics of:

    1. representation (Postscript, Latex, HTML, SGML, graphics, voice, audio, resolution as appropriate to each level, and the questions of basic principles common to all of those and new ones);
    2. redundancy-coding-compression, agents-systems-hierarchies-objects;
    3. user-adaptability-intensional model-level of expertise- profiling - depth of interest- feedback to system and designer - level of satisfaction; education- same or different CAI/CBI/??? principles- textbooks- publishers - economics of educational institutions - superficiality and hands-on of educational processes; and last but not least,
    4. NSF's-role- disseminator- actor - stimulator - limited funder - GII player - multi-disciplinary server in science- abstracter and summarizer of research results - digital library research initiator - convener of forums.
    Without some organizational principles (not limited necessarily to those suggested!) the subject may be unmanageable.

    In summary, I propose three points of view: the tactical (short term), the strategic (long term research goals and directions), and the organizational (taxonomic and hierarchical). I hope these points may be considered in relation to the substantive matters suggested by the attendees.


    Interactive Systems

    John M. Carroll
    Department of Computer Science
    Virginia Tech (VPI&SU)
    Blacksburg, VA 24061-0106 U.S.A.
    carroll@cs.vt.edu

    I believe that WWW offers possibilities for NSF to better support information delivery to and collaboration among investigators, to both appear and actually be more accessible and responsive to the public, and even to streamline its own administrative operations. Herewith is a superset of what I plan to say next week (currently in development at http://info.cs.vt.edu/~carroll/nsf.html).

    A starting point for thinking about these possibilities is to consider the scenarios of use that are enabled (and obstructed) by the current NSF gopher. The STIS facility makes it very easy to review current program announcements, assuming the user knows already what directorate is of interest. The STIS gopher also makes it easy to search by keyword for relevant abstracts from recent sponsored projects. Some information can be browsed but not searched for (for example, the list of program announcements); some information can be searched but not browsed (for example, NSF publications). And quite a bit of potentially useful information is just not there (see below).

    A plausible future scenario of use in which one investigator seeks to benefit from the work of another involves chasing pointers from the abstract of a sponsored project to the journal papers, chapters, talks and technical reports that were produced under that contract. This scenario is not supported by the current gopher facility. Indeed, making available only the 100-word abstract "views" of sponsored projects (along with their total budgets!) makes them all look far less substantive than they are and ought to appear, and far more expensive to taxpayers. We in the research community understand the role and status of a 100-word abstract, but we are not the only browsers of the WWW. Linking together the program descriptions, abstracts of funded projects, and resultant work products would be more useful for members of the research community and more informative to members of the public.

    Such a distributed digital library of NSF research could presumably leverage and at the same time help to focus the considerable NSF effort already underway in digital libraries for Computer Science theses, dissertations and technical reports, and the new set of projects being launched.

    In another plausible future scenario, an investigator might want to pose a question or make a suggestion directly to the relevant PI or the overseeing NSF program manager. However, in the current system it is not possible to directly "reply" to a project abstract, for example, by generating an e-mail message to the PI or the program manager while viewing the project abstract. (Some web viewers allow local annotations, and apparently are planning to support group or public annotation.) Creating such direct links would encourage direct collaboration among investigators. Such a system would also help convey to the public (and to the investigators too, where necessary) that to work on an NSF-sponsored project is to take part in an open process of collaboration and communication. Encouraging this view would be a good thing to do with respect to NSF's technical objectives; conveying this to the public would be a good thing to do with respect to NSF's image and the public's understanding of research work.

    A variant of the scenario is one in which the investigator who authored the original material would like to interact with those who have linked to his or her documents. The author might, for example, want to notify readers of changes to a document previously posted. (The viewer Hyper-G creates such a link database.) Another variant is one in which a reader wishes to access the set of documents that refer to the current document. All of these scenarios would greatly facilitate professional collaboration -- as well as education: it would make documents on the web part of a real network of knowledge and not just one-shot discoveries for readers.

    This collaboration scenario could be elaborated through the implementation of a moderated debate forum, similar to WIT (http://info.cern.ch/wit/hypertext/WWW/Topic1001/Proposal1001). WIT uses the forms capability of http to support debates structured in ways similar to Rittel's classic IBIS (Issue-Based Information System). Users state contrasting positions on various issues and adduce evidence and arguments to make their cases. The current use of WIT has devolved in many cases into just a better interface for general newsgroups (hence the need for a moderator, I think). WIT only supports linking flat text objects. A more sophisticated approach would support linking documents and bibliographies as backing for positions taken. Indeed, this could be a far more efficient framework for technical debate than publication-lagged journal rejoinders or staged (frequently stale) conference panels.

    Currently, NSF proposal review practices depend on moving tons of paper around the country: announcements, pre-proposals, full proposals, reviews. These practices are already evolving; pre-proposals are often handled at least partially via e-mail. This shift could be more explicitly supported by the NSF. Submission and review could be carried out via the WWW, and after the review, submitted proposals could be posted for the benefit of the investigators invited to submit full proposals, as well as of the research community more generally. Eventually, this paradigm could be extended to the submission and review of full proposals.

    There are obvious questions about how quickly such a transformation could occur. But in this area, the WWW offers an opportunity to the NSF to simultaneously achieve administrative streamlining and public accessibility and accountability for all its processes -- as well as providing a testbed for experimenting with authentication capabilities. (There was an address on the web for something called the NSF Electronic Proposal Submission Project, but it turned out not to exist, gopher://stis.nsf.gov/00/NSF/eps/epsinfo)

    Future uses of WWW depend on creating tools and standards to encourage the future scenarios we want to envision and to avoid those we see as unattractive. For example, it is not possible in general to predict where particular kinds of information will be, even if one can determine on what server the information resides. The most popular viewer (Mosaic) incorporates no serious search tools. Many issues in information retrieval need to be raised in this context, filtering, routing, and searching collections and their subsets, fusing results from distributed searches, and so forth.

    Indeed, though one often hears proud quotes about the ever-growing, now multi-terabyte per month volumes of WWW traffic, one has to wonder how much of this is accounted for by scenarios in which overwhelmed users are aimlessly, even hopelessly, browsing for something interesting or useful. Perhaps we should admit that some of these people are just wasting time by any accounting; perhaps they are wasting thousands of hours per WWW terabyte.

    Many technical problems and tasks regarding the use of WWW are consuming huge amounts of time and effort. How much time is being wasted today by people authoring WWW pages on the off chance that someone else will find them and read them? How much time is being wasted trying to build attractive pages with html? There is a need for document type templates. Research is needed in document models, document translation, document analysis and indexing, representation of interaction in declarative forms, integration of scripting languages with declarative (i.e., SGML-based) documentation languages, optimization of time for various authoring, annotation, publishing, reading, reusing tasks with respect to types of document representations.

    The popular viewer Mosaic promises to create crippling technical limitations in the near future: for example, the fact that it must download an entire video object to the client before displaying them means that only relatively small video objects are practical. Research is needed regarding server configurations and interactions. For example, students accessing a system for a course should only need to access something from outside their local server once, thereafter accesses should be to the local server.

    NSF might encourage a study of the design space for WWW viewers to clarify what tools already exist but are not being exploited. Hyper-G incorporates a general serve retrieval tool, Mosaic and Cello do not. Hyper-G, Cello, and Netscape allow video objects to be displayed as they are being downloaded to the client; Mosaic does not.

    Acknowledgments:

    Thanks to Ghaleb Abdulla, Andrew Cohill, Roger Ehrich, Ed Fox, Rex Hartson, Stuart Laughton, JAN Lee, Mary Beth Rosson for ideas and discussions.


    Summary of Recommendations

    Jim Foley
    Director
    Graphics, Visualization & Usability Center
    College of Computing
    Georgia Institute of Technology
    Atlanta, GA 30332-0280
    foley@gvu.gatech.edu

    1. Recommendations concerning use of WWW as an experimental platform for collaborative efforts in the IRIS and computer science research communities, including potential enhancements to WWW in support of this.

    + WYSIWYG authoring tools (some coming from private sector)

    - Note that authoring includes understanding the link network

    - Graphical views for understanding and creating/modifying links

    > for instance, grab a link in a node and link view, connect it to a new node to change the destination target of the link

    - Support for graphical navigational metaphors

    - Import from more of the common word processors, including Mac and PC systems

    - Include support for forms and adaptive forms (Ref Pitkow/Recker GVU) Survey of WWW Users)

    - All data types can be embedded in main page

    - Support mutual exclusion, in case of simultaneous editing attempts

    + Additional asynchronous collaboration support.

    - Borrow the best features of Lotus Notes

    - Dynamic updates, so two collaborators can talk on phone and do updates which are seen at once - cache write-through - and integrate with authoring tools so all changes can be immediate

    + Use as the base for a GLOBAL DESKTOP uniting group of researchers.

    - Make WWW be THE new desk top metaphor - but a shared metaphor

    > Think of it as a replacement for the Macintosh desktop

    - Integrate some or all of local file systems into WWW

    > Code and data files for the project

    - Integrate email and phone messages and faxes and newsgroups and... into WWW

    - Provide modularly-oriented cross-platform integration and unification in ways that UNIX and Mac and Windows have not

    + Use as a browsable program repository - not just for text documents, for programs as well

    - Single click down-load and/or execution

    > Right, there are security problems - but not unique to WWW

    - "Test-drive" of program modules via automatically-generated user interface

    - Reference ARPA Image Understanding Program's code sharing tools

    + Synchronous WYSIWIS collaboration support

    - All the standard things - video and audio connections, collaboration-aware viewers and editors, multi-way conferences, join and leave conference, etc.

    + Better support for bibliographic searches

    - Ranked lists of "hits" much too primitive

    - Text retrieval community has various visual representations - let's get them integrated in!

    - Citation problem - URL's keep changing, need something else

    + An organized repository structure - "Information Clearinghouses"

    - For project archives, courseware archives, etc.

    - More structure than pure chaos

    - Less structure (no bureaucracy please) than a library

    > A `meta system' for creating repositories/archives?

    2. Recommendations concerning research which NSF in general and IRIS in particular should consider undertaking with respect to the WWW, its accessibility, and its usability.

    + Everything discussed under topic 1 above

    + Explicit semantic descriptors - meta-information

    - Facilitate browsing and searching

    - Make easy and/or `cool' and/or valuable, else won't happen

    + More sophisticated browsers

    - Information visualization and abstraction tools for the "hyper" network

    - Support visually-impaired users with sound

    - Speech-recognition using current set of link names as vocabulary

    - Sound as an additional `visualization' modality

    - Scalable down to PDA's and touch-tone phones and set-top boxes

    - Better management of hotlists - get long, need to be organized

    - In-line support for all data types and for programs

    + Teaching-oriented browsers

    - Pedagogically-informed templates or other guidance

    - Support for exploration of `micro-worlds' (Solloway, Guzdial)

    - Support algorithm animations (Stasko) and two-way general

    communications between script and the animation or simulation process

    - Special CSCW needs for education (reference Levenson's system at Sun Microsystems).

    + Embedded active objects - copying a link takes the link's target reference with it, etc.

    + Intelligent agents

    - Filter mail

    - Automatically update email addresses and URL's

    - Continually search for items matching my search profile

    > Resource discovery agents

    + Multiple "looks" to the same information

    - So my pages have the `NSF' look if accessed via the NSF PI directory and have the `GVU' look if accessed via the GVU home page

    > A natural for SGML

    + Quickly move away from HTML as "The FORTRAN of Hyper-media Systems"

    - The language you love to hate

    - The language with which we have to be backwards compatible

    - The language which made it all possible

    3. Recommendations on NSF information delivery to the public and research communities via WWW. Attention will be given to the potential of the National Information Infrastructure to enhance access to NSF information.

    + Projects - Impose as little burden as possible on PIs

    - But, some standards are needed

    - Provide a template, filled in automatically based on award information, for each project (Ref ISX System)

    > PIs

    > abstract

    > List of other NSF-funded projects of each PI

    > Amount of award

    > Other standard stuff

    - PI provides URL for

    > personal home page

    > optional lab home page

    > department home page

    > institution home page

    - Lists of publications and programs

    + Programs - everything on-line

    - Proposal submission via WWW just as can now submit via ftp

    + General info for public

    - General things

    - Art gallery of interesting scientific pictures and animations

    - Some sort of math and science education material and repositories

    Interface Agents on the WWW

    Su-Shing Chen
    National Science Foundation
    Arlington, VA 22230
    schen@nsf.gov

    In addition to information access to WWW, scientific (including computer science) research requires problem solving, decision making, simulation, visualization, and many other capabilities. Often software packages and tools are used for various applications. For applications, software packages and tools should be bidirectionally linked to the WWW information space. For example, a visualization package will use information objects in the information space and return the results to the information space for information sharing and presentation.

    At present, Mosaic/WWW does not support bidirectional linkage to external packages and tools. In their August 1994 Science paper, Bruce Schatz and Joe Hardin discussed the component and the object models. Mosaic/WWW has a component model. Basically, Mosaic/WWW needs to be evolved to an object model for applications. An object model will be also useful to implement various interface agents for application-to-application, appliance-to-application, and application-to-network interfacing on WWW.


    A DL/EP/HT/IR/MM Perspective

    Edward A. Fox
    Associate Professor, Dept. of Computer Science &
    Associate Director for Research, Computing Center
    Virginia Tech
    Blacksburg, VA 24061-0106 USA
    fox@fox.cs.vt.edu

    GENERAL RECOMMENDATIONS TO NSF

    * Don't let WWW flop (i.e., help fix any serious problems that arise), e.g.:

    - encourage use of caching in the networks

    - encourage improved protocols that reduce traffic and delays

    - encourage better integration of indexing and search technology

    * See that Computer Science research results that relate directly to the WWW (e.g., in AI, digital libraries, electronic publishing, HCI, hypertext, information retrieval, multimedia, HCI) can easily find their way into WWW:

    - so that current problems of WWW can be corrected in time

    - so that research results can more quickly work through the technology transfer cycle, and "prove themselves" in real use

    * Use WWW in whatever ways it can be applied (e.g., as suggested invarious of the position statements) to save the time of NSF staff as well as of those who prepare, review, and carry out research projects --- to make NSF efforts more efficient (with no loss in quality), e.g.:

    - provide "templates" and on-line forms to help simplify and organize all data entry, submissions, reviews and other information handling

    - have on-line searchable and browsable versions of all procedures and announcements, databases of PIs/projects/reviewers/panelists,...

    - have on-line "frequently asked question" and other explanations to help staff and outsiders more quickly find answers to common questions

    * Use WWW to help with rapid dissemination of CS research findings:

    - so we have more "reuse" of our science

    - so findings impact as wide a variety of areas as possible by:

    - having sets of pages for each funded project, with similar structure, so one can easily find the important parts of each

    - having all reports and papers related to the project be on-line and directly accessible (possibly inside a TR or publisher's WWW repository, or, at worst, as bibliographic citation plus abstract) --- this calls for liaison with CS technical report, CS digital library, and CS thesis/dissertation projects

    - having on-line demonstrations, visualizations, simulations, animations, movies, code archives, documentation, and other related results

    - having multiple views, for differing levels of expertise/education, e.g., K-12, undergrad, grad, researcher, layman, patent lawyer

    - having indexing, classification, clustering, summarizing, abstracting, and other tools or meta-information to aid access

    2. CONTEXT

    * At ACM SIGIR'95, held in Seattle early in July, we will be among the first to celebrate the 50th anniversary of the July 1945 Atlantic Monthly publication of "As We May Think" by Vannevar Bush. It is still sobering to compare our tools, including Mosaic, to his "memex," and to see how far we have advanced toward solving the problems he addressed: encouraging reuse of scientific discoveries and dealing with the many problems of the "information explosion".

    * Researchers who for decades have worked on the component technologies that make WWW a reality (e.g., EP [electronic publishing], HT [hypertext], IR [information retrieval], MM [multimedia], networking, client/server computing, PCs/workstations,...) are all eager to apply their specialized knowledge and skills to improve it further.

    * Researchers who have integrated these technologies into various types of systems, services and environments, are now eager to move us from today's WWW toward a global digital library, of grand scale, in terms of content, audience, and use.

    * Today's "Nintendo" generation is finding WWW to be the next target for its pursuit of edutainment, at the same time that teachers are turning to it as the host for new courseware, businesses are looking at it as a way to contact customers "without the middleman", and scholars are looking toward it as a unified and collaborative intellectual workspace.

    3. SPECIFIC SUGGESTIONS

    * I encourage NSF to encourage development and use of WWW and Mosaic and have personally acted in this direction:

    - Testimonial 1: Tim Berners-Lee spoke on WWW, Rob Akscyn (then chair of ACM SIGLINK) talked about hypertext, and I spoke about our NSF funded work on digital libraries in March 1993 at on-line Publishing `93, Pittsburgh, PA; immediately afterwards I passed around Tim's WWW notes and brought up some of the early Web software.

    - Testimonial 2: I was greatly pleased when a large group from NCSA came to the July 1, 1993 Workshop on Information Access and the Networks, IANET'93, held in conjunction with ACM SIGIR `93, in Pittsburgh --- and reported on Mosaic.

    - Testimonial 3: Kurt Maly, Alan Selman, Jim French and I have coordinated work since early 1993 on the NSF funded <A HREF="http://www.cs.odu.edu/WATERS/WATERS-GS.html"> WATERS project</A> for CS technical report capture, storage and dissemination, that uses WWW and Mosaic for browsing, query entry, and presentation of search results and reports. The <A HREF="http://cs.indiana.edu/cstr/search">Indiana TR</A> project has shorter term goals. Integration with the ARPA CSTR effort (e.g., <a HREF="http://cs-tr.cs.cornell.edu/TR/CORNELLCS:TR94-1418"> DIENST</A>) has recently been approved by ARPA, and will be enabled by the facilities of WWW.

    - Testimonial 4: My Fall 1993 Information Storage and Retrieval course made extensive use of gopher and Mosaic, using servers that have been running continuously since then; in Fall 1994 my department has three "paperless" courses, with all instruction and courseware delivered through WWW, and at least 9 computers running WWW servers.

    * However, there is plenty of room for NSF-funded research to help ensure that WWW and Mosaic lead to even better systems, services, and tools:

    - The Hyper-G system, funded by the Austrians, has many technical advantages over the commonly used WWW and Mosaic services, e.g.,:

    + viewers for text, images, and movies that are sensitive to links and anchors inside "documents";

    + indexing automatically applied when documents are added so that searching is integrated with browsing;

    + hierarchical browsing option for Hyper-G portions of the WWW;

    + caching by local servers that eliminates the need for clients to repeatedly connect to distant servers;

    + "live" mode that gives progressive display of images and movies to facilitate browsing and that allows cancelling of slow "get" operations;

    + multilingual document and interface support;

    + presentation of any SGML document (as opposed to just HTML documents);

    + searching on collections or subcollections, and on content;

    + a distributed OODB for links and metadata;

    + support for personal or group views or "webs" that allow multiple different sets of links "above" a document collection, even links between anchors that are added to read-only document pairs, through the link database; and

    + notification and automatic removal of a link from the link database when either the source or target document is deleted.

    - MIME, which was planned as an interim solution to multimedia interchange, should eventually be replaced with a more efficient standard, with better integrated compression, that would replace it in WWW and Mosaic use.

    - Full support is needed for SGML (for descriptive markup that reduces the cognitive load on authors); DSSSL (for specifications on how to present or print SGML-encoded documents); HyTime (for more comprehensive description of hypertext, hypermedia, and time-based documents); and MHEG (for object-oriented description of multimedia, hypermedia, and interactions) --- to facilitate interchange and electronic (re-)publishing.

    - KMS (a hypertext system, provided by Rob Akscyn's company) type support is needed for: fast and efficient collaborative editing of hypertext documents, draw operations, line art, rapid client/server protocol, and easy adding of annotations.

    - Multimedia presentation support is needed, and should move to constraint and style-guide based schemes; content-based indexing and analysis of multimedia documents (e.g., images, movies, speech, music) is also of great importance.

    - Advanced retrieval techniques should be applied more to the WWW, e.g., with (SGML structure) context-dependent searching, use of extended Boolean schemes, morphological analysis of document and query terms, automatic query expansion using a lexicon or thesaurus, clustering to aid in browsing and retrieval, session-long interactive query improvement through relevance feedback, use of knowledge bases to allow inferencing and conceptual retrieval, etc.

    - IR based models of user-intermediary interactions should drive how systems carry out interactive sessions, and could inform the behavior of intelligent retrieval systems.

    - Use of general purpose knowledge representation and protocol standards (KQML, KIF) should be made in agent-based systems.

    - Tighter integration is needed with other parts of users' environments, such as authoring languages for courseware, scripting languages that specify interactions, mail handlers (that could allow automatic filtering, routing, classification, filing and retrieval), expert systems / decision support systems, help and documentation systems, calendar / tickler systems, agenda and outlining tools, citation and content searching, etc.

    * NSF-funded research in this area might follow these strategic guidelines:

    - Integration of WWW-related efforts with Digital Library work should be strongly encouraged.

    - Cross-fertilization of ideas should be encouraged through:support of interdisciplinary workshops and preference given to broadly skilled project teams or coordination between projects.

    - Scalability, usability, efficiency, and effectiveness should be required as design criteria for most of the funded efforts, and should be the basis for obligatory evaluation phases of such studies.

    - Studies of users, as they learn anew how to write, read, learn, and work in the WWW environment, must be undertaken before we loose the chance to collect data about them --- and results should be fed into design and development efforts.

    - CS researchers should be encouraged to work with other NSF-funded investigators to create "environments", building upon the current WWW, for collaborative use, such as for: scientific computation, engineering design and development, learning, courseware development, multimedia programming, or proposal preparation/review --- with on-demand access to needed information, tool integration, multiple views of design/development/use spaces, and as-needed conversions between data/information/knowledge representations.

    - While expanding the functionality of our emerging universal interfaces, we must also go back to tailored and crafted specialized interfaces, drawing ideas from them and fitting those back into WWW.

    - A long-term perspective should be adopted, so

    + we understand the relation of network and system architectures to the need for scaling up in size of content and number of users;

    + we think not only about the idea of agents but also about other means to unleash the tremendous computational power of the Internet; and

    + our research and experimentation is aimed toward task support that cuts out the middle man and empowers users.


    Intelligent Information Services

    Tom Mitchell
    Carnegie Mellon University
    tom.Mitchell@cs.cmu.edu

    My primary point is that NSF should take the lead in two areas:

    1. basic research on information capture/indexing/retrieval via the WWW, and
      1. use of the WWW to communicate scientific results to the nation at large (including primary/secondary schools, universities, and the research community)
    Of course these two goals go hand in hand. Progress on the first should enable better accomplishing the second, and vice versa.

    RECOMMENDATIONS CONCERNING USE OF WWW AS A PLATFORM FOR COLLABORATIVE EFFORTS

    To support collaboration in the area of intelligent information agents, define and support standard interfaces from software agents to WWW browsers (e.g., through something like the Mosaic forms interface), so that researchers around the country can easily link their software into a growing web of agents, just as we currently link new HTML pages into a growing web of static information. A concrete suggestion: fund a small meeting among researchers interested in immediately linking up their existing software agents in this fashion.

    To support laboratory sciences, provide access via WWW to scarce laboratory equipment. E.g., see recent efforts by Sandia and Univ. Penn. to provide access via the WWW to their robotic equipment.

    More generally, opportunity to share software, laboratory hardware, data sets, simulations. But for this to catch on, sharing these must be as simple as getting to a new URL via Mosaic.

    RECOMMENDATIONS CONCERNING RESEARCH WHICH NSF SHOULD CONSIDER UNDERTAKING

    Basic research related to information capture, organization, and retrieval in the broadest sense (multimedia, multiuser, dynamic information,...). Others will do much development-oriented research. NSF has a unique ability to supply the essential science that NII technology will build on. Payoff can be expected from *BASIC* research in a variety of areas, including: text/natural language processing (relevant to computer understanding of web pages) understanding of multimedia information (combined video, sound, text,...) new Information Retrieval methods (going beyond traditional IR methods to leverage the specific characteristics of the WWW) machine learning oriented to the type of information found on WWW (for learning how to find information on the WWW, for learning the interests of individual users, for discovering trends/regularities across information sources on the WWW, for linking up individuals with common interests...) human-web interaction (HWI). The interface between the WWW and its users is key, and deserves basic research.

    RECOMMENDATIONS ON NSF INFORMATION DELIVERY TO THE PUBLIC AND RESEARCH COMMUNITIES VIA WWW

    NSF should take the lead on courseware for the nation. A vast number of secondary and university faculty teach overlapping material. There is great duplication of effort, and great variance in the quality/accuracy/recency of information being taught. Somebody should organize and make accessible the material taught in all these courses so that it can be easily shared, and so that students can dig deeper in the areas in which they are interested. This would have significant impact on life-long education as well. There's a huge opportunity here, and NSF seems like the obvious agency to seize it.

    NSF should require WWW pages for every NSF-funded project, indexed from the NSF home page. Perhaps NSF should support on-line journals. I don't know how to best do this... Publishers to date have been impediments, not advocates.


    WWW is for Knowledge and Information

    John D. Hestenes
    National Science Foundation/CISE/IRIS
    Arlington, VA 22230
    jhestene@nsf.gov

    Biomedical Eng. and Science Institute
    Drexel University
    Philadelphia, PA 19087
    jhestene@ece.drexel.edu

    General Issues

    I believe that we are at a cross-roads from which we cannot go back. NSF and the investigative research communities must move now to a new place in terms of capabilities, knowledge and the basic research agenda.

    The Primary Demand Arising from the WWW is for Knowledge and Information

    As much as tools help it is useful knowledge that people want. This suggests that some of the key basic research areas must include at least:

    NSF must make its WWW pages interactive

    The easiest thing to do would be for a fixed set of NSF pages, as is currently the case, that is those "blessed" by policy and authorization schemes. Instead, the "blessed" feature should be few and algorithmic rather that reviewed in every last detail. Program Directors should have direct control and authority (as well as responsibly) over pages which represent the Program they are responsible for, within relatively loose guidelines. In particular, Program Directors should be encouraged to present their philosophy of how they view the Program and the represented fields, indicate publicly areas where they think research is lacking and generally use their pages to shape the investigative community as a whole -- however gently -- to properly reflect the growing edges of the basic research community. Forms and new features should be implemented so that interactivity with the investigative community can be increased while still making the NSF processes work in a timely and responsible fashion.

    Some of the workload of NSF Program Directors should be distributed to the PIs

    Form and other inputs methods, along with instructions, can be used to engage the PI in preparation of several types of abstracts for general consumption and for peer reviews. Coordination of workshop results can be done by PIs via the WWW and information about links to the resulting network documents can be submitted to the NSF database for dynamic assembly of the Mosaic pages that reference events of interest to the broader community.

    The WWW can become a resource for enabling young investigators for life

    When a person tries to decide if they want to pursue research in Computer Science (or other fields) one should be able to point them to links which, if followed and explored, will give them basic data and experiences relevant and similar to those of established researchers. When they actually enter apprenticeship for an advanced degree these network resource should permit them to explore global trends in research in a timely and accurate way. Once established as investigators the resources should empower them to collaborate across traditional boundaries and become a life-long learning experience for themselves. Relations to industry, education, medicine and society must also be supported.

    Information exploration will expose the poverty of our theoretical moorings

    Just as Virtual Reality require complex synchronicity of many events to "fool" a person into believing that he/she is "present", so a similar phenomena will rise from network access technology. That is, we will find that so many things must be understood to make the access technology work and the usefulness and meaningfulness to the user transparent, that we will realize that our notions about knowledge, users, interactivity and community engagement will be insufficient for building future societal tools and experiences. Already the existing technology is pressing those directly in education to define what "education" really is. If we now make mountains of information available by internet methods will we also need to participate in defining how to use this information for useful "education"? Do we know what "education" is? Do we know what "information" is?... a new research agenda will evolve.

    Presentation Outline:

    I. Issues Regarding WWW / Mosaic-Like Capabilities

    A. Support of Computer Science Basic Research

    1. Attracting Researchers

    2. Growing and Equipping Young Researchers

    3. Access and Distribution Between Research Enclaves

    4. Access to Network Resources for Research

    5. Collaboration on National and Global Projects

    B. NSF Information Distribution

    1. To Enable the Computer Science Research Communities

    2. Among Government Agencies

    3. Domestic Industry, Education, Medicine, etc.

    4. Public Information Delivery

    5. Global Information Access and Delivery

    C. R&D with Industry and National Infrastructure

    1. Coordinating Research Agendas with Industry Needs

    2. Attracting and Building Industry Collaborations

    3. Strengthening National Professional Associations

    4. Evolving Network Publications

    5. Access and Aggregation of Network Databases

    6. Technology for NII and GII Applications

    D. New Research Agenda Issues

    1. Access, Searching and Filtering Issues

    2. Dissemination, Quality Issues

    3. Strategies for Managing Massive Input Volumes

    4. Issues TBD in PI Workshop

    II. IRIS Division WWW Research Program Planning

    A. PI Workshop - Oct. 31, 1994

    B. Possible Future Workshops on Selected Topics

    C. Possible Announcements of Research Opportunities

    III. "NII Roadmap" - National Research Council

    A. Considering Mosaic for Announcements

    1. White Paper Inputs from Industry, Academia, Government

    2. Public Forum May / June 1994

    B. Cannot Handle Mosaic Forms Input Information

    1. Too much to collect and to reply to

    2. Too varied or inappropriate

    3. Few or no classification structures and mechanisms

    IV. Distributed WWW Server Project

    A. Program Director's Desktop Capability

    1. Design and Management of Program Mosaic Pages

    2. Dynamic Assembly of Most Recent Information

    3. Automatic Enforcement of Policies:

    Privacy, Publication, Distribution

    4. Reduce Program Staff Workload and Timeliness

    5. Use for negotiating joint funding of proposals

    6. A small demo being developed internally:

    Off-the-Shelf Approach:

    Mosaic + FoxPro + Microsoft SQL Server

    B. Use of "Forms" for PI Inputs to the Program, e.g.:

    1. Abstracts (One-liners for lists, Summary of Awards,

    Detailed for Peers)

    2. Reports on Grants

    3. Key words for dissemination and access

    4. Adding & Changing Links to Grantee WWW Server

    C. Support for the Research Community -- Some Topics:

    1. Scope and Style of Program Research

    a) Program Description and its Elements

    b) Elements of a proposal targeted for this Program

    c) How to do Science

    d) Description of Program Director's Philosophy

    2. Announcements

    a) "What's New" Pages

    b) Proposal Opportunities

    c) Summary of Awards from NSF Servers

    d) Infrastructure Resource Announcements

    3. Links to Resources and Related Research

    a) Links to programs and agencies that do joint funding

    b) Links to Grantee's WWW Servers

    (1) Workshop Reports where Grantee is the PI

    (2) Grantee's Detailed Research Descriptions

    (3) Bibliographies, Abstracts and Cross-Indices

    (4) Network-accessible tools on-line experiences

    (5) Curriculum Models and Issues

    c) Links to National and Global Resources and Studies

    d) Links to Basic Science Bibliographical Resources

    D. Suggestions for Content of NSF IRIS Mosaic Pages?


    Scalability and the WWW

    V. B. Balayoghan, Don Fussell, & Avi Silberschatz
    Department of Computer Sciences
    University of Texas
    Austin, Tx 78712
    avi@research.att.com

    As the user base of the World-Wide Web expands rapidly and an ever increasing amount of information is being made available through it, the degree of scalability of the current Web infrastructure is being tested to its limits. The problems of scale are already being felt on two fronts: first, the servers that publish popular pieces of information regularly get overloaded with user requests during peak access times and are effectively unavailable for several hours; secondly, the sheer vastness and variety of the information available on the Web makes the resource discovery process an overwhelming one for the average user. The trends in the growth of information volume and the user base size only point to further worsening of this situation. The increasing use of the Web to disseminate voluminous multimedia objects is likely to make the network congestion from Web servers more acute; whereas the pattern of a larger share of the user population coming from non-technical backgrounds may make the information discovery process still harder for the typical Web user of tomorrow. Considerable enhancements are thus needed in both the infrastructure of the Web and in the traversing tools in order to make the system scale better to the expected future load.

    Several basic deficiencies of the present WWW system contribute to the overloading of information servers. Information objects are always retrieved in full before the user gets a chance to peruse it and decide whether he or she really wants it. For large objects such as digitized photographs, this method wastes considerable network bandwidth in addition to forcing the user to wait the entire duration of object transfer before any useful information about the object is made known. The lack of replication among most information servers further compounds the overloading caused by needless full object retrievals.

    Incorporating multiresolution retrieval as the fundamental method of object transfer will be a significant enhancement to the infrastructure of the Web. More research needs to be done on the use of intelligent agents at server sites to manage multiresolution, replication and consistency. Each information object published by a server may be defined at various resolutions according to its data type. For a digitized photograph, a low resolution version may have several adjacent pixels grouped together and assigned an average color value. For a large Postscript document, a low resolution copy may consist of the abstract and the index. Depending on the typical load on the server and the storage requirements, the agent at the server site may decide to store one or more of the multiresolution versions and generate the others on demand. In the default mode of object transfer, a low resolution copy of the object is retrieved and displayed to the user while the full or higher resolution copy is being transferred. The requesting user is thus shown a version of the object promptly and can decide whether to continue with the retrieval or to abort the request.

    The agents may also be programmed to construct customized low resolution versions of objects. For example, instead of a user retrieving an entire collection of documents only to build an index for future reference, the active agent at the site serving the documents may be asked by the user to construct the index on the fly according to the specifications supplied by the user. The index may then be shipped across the network instead of the full set of documents.

    The ability to retrieve predefined and customized multiresolution versions will result in considerable savings as increasing numbers of large multimedia objects are made available over the Web: the extracted content of such an object is likely to be a few orders of magnitude smaller than the object itself. The positive results of multiresolution support include better response time behavior to users and the potential for reducing load placed on the system by browsing users and automated index generation routines (that are typically used to generate and maintain the index used by search engines).

    Automated replication management is another important key to achieving scalability. Agent and mediator technologies may be employed at server sites to monitor the load, manage replication and forward user requests transparently to the replicas. More research work needs to be done on incorporating an effective consistency management scheme under multiresolution retrieval.

    Significant improvements are also required in the traversal tools (such as xmosaic) to make the process of information discovery less daunting to the average user. The focus in user interface design should be on supporting multiresolution retrieval and display and on providing tools for the user to construct customized high level views of the information space. The information displayed to the user should be incrementally refined as more data is retrieved and components are received at progressively higher resolutions. This change in the user interface will noticeably reduce the `dead time' spent by users as they wait for network connections to be made and the requested objects to be transferred.

    Customized views of the information space are a natural extension of the unscalable `Hotlist' feature currently available on most viewers. Tools to automate the process of building views will help make it more scalable with respect to the rate of generation of new data. Agents may be deployed at the user's end to scan the information that has newly become available from a set of given sources and selectively incorporate it into the user's personal view according to a specified criterion.

    Finally, more research needs to be done in the design of effective search engines that scale well and are comprehensive. Discovery by navigation is impractical for all but the smallest of information spaces. A variable-resolution search mechanism that permits successive refinement and navigation of search results may provide the general search facility. In addition, specialized topic specific search engines may be provided to the user to attach at appropriate points in the customized view.


    Database and Expert Systems

    Tomasz Imielinski & B.R.Badrinath
    Department of Computer Science
    Rutgers University
    New Brunswick, NJ 08903
    {imielins,badri}@cs.rutgers.edu

    Introduction

    In order to start designing the NSF server it is important to consider a typical cycle in the life of a NSF funded researcher and structure the server's functionality accordingly. There are three basic phases of the typical cycle of interaction between a researcher and the NSF:

    1. Finding out about NSF programs, new initiatives, deadlines, etc. These announcements may be available by either explicitly accessing the NSF page or by becoming a member of a specific multicasting group. How many such groups need to be supported and how a membership of each group will be established (e.g manual or automatic? - based on profiles).
    2. Submitting a proposal, following up on the reviews, getting information about funding decisions, etc. Would be really nice if the status of the research proposal could be checked "on line". This involves all sorts of security issues as well as the issues of status updates.
    3. Doing the research, publishing results, disseminating to other researchers, submitting the final report, enhancing and extending obtained results.
    We believe that NSF has a unique chance to offer a standard interface (description) of the research results (papers, demos) by requiring that all final reports for NSF sponsored research confirm to a standard. The key question is what is this standard going to be? This is an interesting issue which should become a subject of discussion at the workshop. Below, we would like to summarize the concept of "active papers" which we have proposed sometime ago as a possible way to go:

    Active Research Library

    Active papers are papers which can be visualized, rendered, explained and tested. Active papers are not static but rather are snapshots of the evolving experimental research which allow the "readers" to follow the footsteps of the authors by recreating the reported experiments and even modifying them. Active papers use visualization and speech and can be queried by the interface which is specific to the paper content. Therefore, the active paper is a time evolving process and reading such paper involves not only querying its current content but also monitoring its future.

    Active papers will provide the readers with possibility of interacting with the content of the paper using the interface provided by the author (in the same way as she now provides the table of contents). In this way, each paper becomes a "database" which can be queried and even, in some cases, updated. As in databases, different views can be defined for the same paper: for example one can create "presentation views" for talks of different length and depth including high level (possibly visual) abstracts of "what the paper is about?"

    Below we present some of the features offered by active papers:

    1. Visualization of the paper's content including numerical data, the behavior described by equations etc.
      1. Rerunning simulations in the paper even for different parameters than these used in the paper itself.
      2. Providing more detailed explanations to the paper's content by following active hyperlinks to the remote servers and remote appendix. The concept of "remote appendix" which would store exact derivations of the formulas in the paper and perhaps additional experimental data (before aggregation) or intermediate simulation results. The remote appendix may also include laboratory notebook describing intermediate steps which were not included in the paper itself. Remote servers may include additional tools such as MathLab, simulation tools and generation of images and animations.
      3. Testing the results of the paper on widely available benchmarks. For example, being able to run included code on the test data.
      4. The active paper will be only a snapshot of the ongoing research process. Active papers will be extendible both by authors (producing new versions) and by other scientists by producing active hyperlinks. Thus reading the active paper will involve also monitoring the "follow up" changes made to it.

    Active Paper = data + interface

    Data includes raw experimental data as well as mathematical derivations and text. The interface is provided in the form of the set of predefined methods which can be run on the paper's data. Just as now people are using latex to prepare their papers they will be using, VITEX *visual tex" to prepare the interface. In fact the future "journals" may require that the inteface be compatible with particular interface standard. Some current "ftp" sites already offer some rudimentary examples of author's defined interface (no visualization just simple directories of what is available).

    Reading (visualizing) the paper will now be a complex and expensive computational process which may involve remote accesses to remote servers (remote appendix, mathlab servers etc). Papers may be "read" either on the client machine, if it has enough computational power, or at the server's machine through remote access. The process will also require caching and monitoring of the future changes.

    It is also important to provide ways for shared access to the active papers. Ad hoc interest groups can be formed so the future updates to the paper's content can be disseminated to the interested parties. Active papers are never finished and continuously evolve, therefore interested parties should ne able to monitor such evolution.

    NSF has a unique opportunity to lead efforts in this direction.


    Heterogeneous Data Integration on the WWW

    V.S. Subrahmanian
    University of Maryland
    College Park, MD 20742
    vs@cs.umd.edu

    INTRODUCTION:

    Current users of the world-wide web need to personally navigate through a potentially deep, hierarchically structured menu of options. They often need to specify:

    In order to effectively address these issues, advances will need to be made in a number of areas. More importantly, techniques are needed that will enable these advances, each in a specialized area, to be merged together to form a cohesive whole. These areas include:

    HETEROGENEOUS QUERY LANGUAGES:

    A WWW user may wish to ask his/her queries in one or more high level languages. For instance, a user familiar with SQL may wish to treat accesses to the Web as an SQL Query:

    (SELECT AUTHOR,TITLE
    FROM YELLOW_PAGES
    WHERE TOPIC IS_SIMILAR_TO "Allergy")
    Though the web may not be maintained as a set of relations in a relational database, an experienced SQL user may wish to ask queries of this kind. Translation mechanisms that convert such queries into a search of a path(s) on the web are critically needed. This is true not only for SQL, but also for a wider variety of query languages such as natural language, predicate logic, and/or query-by-example.

    Even more importantly, a user may wish to express a conjunctive query where part of the query is expressed, say, in SQL, and part of the query may be expressed, say, in a different language. For instance, if the user has, in one window, a picture (containing, amongst others, an individual I who is an author), the user may wish to ask, in some formalism, a query of the form: ``Find a copy of a book written by the wife of the person in the picture around whose face, I am drawing a box". Here, the user draws, with his/her mouse (say), a box around the face of a person, and expresses the rest of his/her query in, say, an SQL like language. What constitutes a ``kosher" way of merging different querying paradigms and query languages together? How do we denote formally, the fact that different parts of the query are expressed in different media (picture, text), yet they share a common part? How are such queries to be processed (multimedia join)? These are critical scientific issues that need to be addressed.

    SIMILARITY-BASED SEARCH:

    The SQL query given above contains a special construct called "IS_SIMILAR_TO". With the explosion in the quantity of information available on electronic media, and with the increase in availability of computational resources to ordinary "on the street" Americans, the need to process "vague" queries has increased dramatically. Vagueness implies that "semantic matching" must be done. Given the fact, in item (I) above, that queries may be expressed in multimedia format, this takes on new meaning. "Find me the titles of similar books written by Russian authors who are linked with the person shown in picture P?" is an example of an exceedingly complex query that a user may ask when simultaneously viewing, in two sessions, a picture P, and a book B, both on the web.

    INTELLIGENT AGENTS:

    The preceding example specifies a "similarity-based" query. Personalized agent technology needs to be further developed in order to determine a user's preferences, especially when s/he asks vague queries. For instance, a doctor who wants all relevant papers published since 1991 on topics related to a part of an X-ray he is currently viewing, may be looking for a near-exact match. On the contrary, a professor who wants all travel brochures on places related to a beach he is currently viewing may be willing to accept any sunny beach. Note that syntactically, the two queries are very similar in form, the only difference, really, is that in one case, "beaches" are being looked for, while in the other, an X-ray is being looked at. Thus, intelligent agents that determine similarity of queries based on content and context, are critically needed.

    Table of Contents Next Chapter