Table of Contents
Previous Chapter
Beyond the passive process of creating access to grant abstracts, program descriptions and announcements, and workshop reports, I believe that there is an important opportunity to create useful interactions among researchers with common or intersecting interests, and to establish a more active model of information management. So, for example, there are already discussions and explorations of such things as electronic interactions about papers, putting research data sets on-line for comparative benchmarking, putting computer programs on-line for comparisons of experimental approaches, and even using the networks to link laboratories together to enhance sharablility of resources.
I believe we should start with the obvious, and enhance what we already do or should be doing, but that the real payoffs of the new medium will be achieved by designing new modes of access and interaction.
There are currently an estimated 40 million people with access to the world's Internet. As computers become more widely available and used, we can expect up to two orders of magnitude increase in the number of people with access to the world network. Mail traffic to any individual user could be kept low by creating limited and separate networks, limiting the number of people per mailing list (thus limiting broadcast authority), or else developing more sophisticated message routing methods. The message routing alternative is both desirable and technically feasible. Current methods are all keyword based (either text or image-grounded), and insufficient (and in fact have not advanced much beyond the 60s and vector-matching methods, just done faster).
It would be of general use, both to NSF fundees and the public, to organize a coherent NSF bibliography database, or databases, based on bibliographies which already exist (and are presumably growing). This would be especially useful in allowing NSF funded groups to share the results of their literature searches. Provisions would be necessary for ensuring proper translation between various formats for bibliography files (e.g., bibtex, refer, etc.)
Since a LaTeX to HTML translator already exists (written by Nikos Drakos at the University of Leeds), translation of LaTeX publications to HTML is fairly easy in principle. Publications which have been translated to HTML are broken down by this translator into a hierarchy of sub-pages, indexed by a table of contents. In addition, the ability to place images (including color) in-line allows for potentially better illustration quality (especially if your printer is out of toner!). It may be ultimately preferable to generate HTML "preprints" first, from which paper publications can later be generated. A similar idea is already being tried in the field of Neuroscience, where on-line electronic <a href="http://salk.edu/NeuroWeb/posters/virtpost.html">"virtual posters"</a> can be used to disseminate research results in advance of their exhibition at neuroscience meetings. It should be noted too that the LaTeX to HTML translator may still have some bugs to be ironed out.
Subject directories are a useful way of obtaining more information about a field, and the kinds of research being undertaken at various institutions. An example of such a directory is the <a href="http://piglet.cs.umass.edu:4321/robotics.html">Robotics Internet Resources Page</a> at UMass. NSF could be instrumental in establishing other subject directories, and collecting lists of such directories through its home pages. This would be very useful for people who are starting out in a field.
On the short term there is little question that the current state-of-the-art allows accessibility in a form and quality - if properly organized - not available before and of which we should take advantage. It is also clear to me that it is necessary to judiciously monitor and establish the quality of the content of any repository, and that this can not be a job just added to someone or some group on a part-time or ad-hoc basis. So the issues are of responsibility and also of content and form in the short term. However, there is also the need to think about the most effective manner in which, whatever is done today and in the near term, can be utilized in the next generation(s) of WWW services. In other words, a plan for change and improvements. The availability of MOSAIC, NETSCAPE, and others seems but the prelude to an avalanche of browsers and related services whose popularity, needs, and expenses we must anticipate. The questions here are: how are we going to ask people to participate both as users and as providers of information, who should do it, at what point do we standardize or change standards, who decides, what should be included, what rejected, how should it be organized, how do we differentiate and decide between experimental level services (with proper warnings) and established delivery services. I visualize a collaborative user/provider group, assisting in this effort on an ongoing basis. Some groups are already functional or in the making, but the question of "what objectives the group has" needs to be established. In our case the objective is clear: to provide access, search, navigation, and any other services on funded research and activities sponsored by the NSF.
On the long term the issues are even more interesting and vital: the WWW is both a delivery and interchange mechanism, and also an experimental platform for the future global information infrastructure. How can NSF tap this resource in the future and encourage today its scientific development in concert with the research community? This is a richer domain which has been addressed well in other position papers. The part that may have not been addressed, and often catches us by surprise because of unanticipated developments, is how to transition and not have to throw away all the previous database and work done before, that is, to optimize the previous investment. I'd like to see that issue addressed. Perhaps even the question of "quality" of content via guidelines or metrics. There is much vacuous info in many home pages that I look at. What are the methodologies for effective construction and efficient human intellectual consumption in this new medium? How do we experiment with this tool with the explicit knowledge (required warning?) of the subjects? What are the models of HCI that may be appropriate for this environment and how could they help both in efficiency and in design, in capture, organization, representation, navigation and submission of works? Multi-person and multi-machine operation are unavoidable extensions, some of them already in process. My emphasis on integrating speech and natural language discourse and dialogue as "natural" interfaces extends to the WWW and its browsers. The sociology and economics of "CAI on the Web" and what it would do to classic education - both good and bad - need to be considered and can not be left to chance.
Possibly a hierarchy and taxonomy of issues could be constructed, including such specific topics of:
In summary, I propose three points of view: the tactical (short term), the strategic (long term research goals and directions), and the organizational (taxonomic and hierarchical). I hope these points may be considered in relation to the substantive matters suggested by the attendees.
A starting point for thinking about these possibilities is to consider the scenarios of use that are enabled (and obstructed) by the current NSF gopher. The STIS facility makes it very easy to review current program announcements, assuming the user knows already what directorate is of interest. The STIS gopher also makes it easy to search by keyword for relevant abstracts from recent sponsored projects. Some information can be browsed but not searched for (for example, the list of program announcements); some information can be searched but not browsed (for example, NSF publications). And quite a bit of potentially useful information is just not there (see below).
A plausible future scenario of use in which one investigator seeks to benefit from the work of another involves chasing pointers from the abstract of a sponsored project to the journal papers, chapters, talks and technical reports that were produced under that contract. This scenario is not supported by the current gopher facility. Indeed, making available only the 100-word abstract "views" of sponsored projects (along with their total budgets!) makes them all look far less substantive than they are and ought to appear, and far more expensive to taxpayers. We in the research community understand the role and status of a 100-word abstract, but we are not the only browsers of the WWW. Linking together the program descriptions, abstracts of funded projects, and resultant work products would be more useful for members of the research community and more informative to members of the public.
Such a distributed digital library of NSF research could presumably leverage and at the same time help to focus the considerable NSF effort already underway in digital libraries for Computer Science theses, dissertations and technical reports, and the new set of projects being launched.
In another plausible future scenario, an investigator might want to pose a question or make a suggestion directly to the relevant PI or the overseeing NSF program manager. However, in the current system it is not possible to directly "reply" to a project abstract, for example, by generating an e-mail message to the PI or the program manager while viewing the project abstract. (Some web viewers allow local annotations, and apparently are planning to support group or public annotation.) Creating such direct links would encourage direct collaboration among investigators. Such a system would also help convey to the public (and to the investigators too, where necessary) that to work on an NSF-sponsored project is to take part in an open process of collaboration and communication. Encouraging this view would be a good thing to do with respect to NSF's technical objectives; conveying this to the public would be a good thing to do with respect to NSF's image and the public's understanding of research work.
A variant of the scenario is one in which the investigator who authored the original material would like to interact with those who have linked to his or her documents. The author might, for example, want to notify readers of changes to a document previously posted. (The viewer Hyper-G creates such a link database.) Another variant is one in which a reader wishes to access the set of documents that refer to the current document. All of these scenarios would greatly facilitate professional collaboration -- as well as education: it would make documents on the web part of a real network of knowledge and not just one-shot discoveries for readers.
This collaboration scenario could be elaborated through the implementation of a moderated debate forum, similar to WIT (http://info.cern.ch/wit/hypertext/WWW/Topic1001/Proposal1001). WIT uses the forms capability of http to support debates structured in ways similar to Rittel's classic IBIS (Issue-Based Information System). Users state contrasting positions on various issues and adduce evidence and arguments to make their cases. The current use of WIT has devolved in many cases into just a better interface for general newsgroups (hence the need for a moderator, I think). WIT only supports linking flat text objects. A more sophisticated approach would support linking documents and bibliographies as backing for positions taken. Indeed, this could be a far more efficient framework for technical debate than publication-lagged journal rejoinders or staged (frequently stale) conference panels.
Currently, NSF proposal review practices depend on moving tons of paper around the country: announcements, pre-proposals, full proposals, reviews. These practices are already evolving; pre-proposals are often handled at least partially via e-mail. This shift could be more explicitly supported by the NSF. Submission and review could be carried out via the WWW, and after the review, submitted proposals could be posted for the benefit of the investigators invited to submit full proposals, as well as of the research community more generally. Eventually, this paradigm could be extended to the submission and review of full proposals.
There are obvious questions about how quickly such a transformation could occur. But in this area, the WWW offers an opportunity to the NSF to simultaneously achieve administrative streamlining and public accessibility and accountability for all its processes -- as well as providing a testbed for experimenting with authentication capabilities. (There was an address on the web for something called the NSF Electronic Proposal Submission Project, but it turned out not to exist, gopher://stis.nsf.gov/00/NSF/eps/epsinfo)
Future uses of WWW depend on creating tools and standards to encourage the future scenarios we want to envision and to avoid those we see as unattractive. For example, it is not possible in general to predict where particular kinds of information will be, even if one can determine on what server the information resides. The most popular viewer (Mosaic) incorporates no serious search tools. Many issues in information retrieval need to be raised in this context, filtering, routing, and searching collections and their subsets, fusing results from distributed searches, and so forth.
Indeed, though one often hears proud quotes about the ever-growing, now multi-terabyte per month volumes of WWW traffic, one has to wonder how much of this is accounted for by scenarios in which overwhelmed users are aimlessly, even hopelessly, browsing for something interesting or useful. Perhaps we should admit that some of these people are just wasting time by any accounting; perhaps they are wasting thousands of hours per WWW terabyte.
Many technical problems and tasks regarding the use of WWW are consuming huge amounts of time and effort. How much time is being wasted today by people authoring WWW pages on the off chance that someone else will find them and read them? How much time is being wasted trying to build attractive pages with html? There is a need for document type templates. Research is needed in document models, document translation, document analysis and indexing, representation of interaction in declarative forms, integration of scripting languages with declarative (i.e., SGML-based) documentation languages, optimization of time for various authoring, annotation, publishing, reading, reusing tasks with respect to types of document representations.
The popular viewer Mosaic promises to create crippling technical limitations in the near future: for example, the fact that it must download an entire video object to the client before displaying them means that only relatively small video objects are practical. Research is needed regarding server configurations and interactions. For example, students accessing a system for a course should only need to access something from outside their local server once, thereafter accesses should be to the local server.
NSF might encourage a study of the design space for WWW viewers to clarify what tools already exist but are not being exploited. Hyper-G incorporates a general serve retrieval tool, Mosaic and Cello do not. Hyper-G, Cello, and Netscape allow video objects to be displayed as they are being downloaded to the client; Mosaic does not.
+ WYSIWYG authoring tools (some coming from private sector)
- Note that authoring includes understanding the link network
- Graphical views for understanding and creating/modifying links
> for instance, grab a link in a node and link view, connect it to a new node to change the destination target of the link
- Support for graphical navigational metaphors
- Import from more of the common word processors, including Mac and PC systems
- Include support for forms and adaptive forms (Ref Pitkow/Recker GVU) Survey of WWW Users)
- All data types can be embedded in main page
- Support mutual exclusion, in case of simultaneous editing attempts
+ Additional asynchronous collaboration support.
- Borrow the best features of Lotus Notes
- Dynamic updates, so two collaborators can talk on phone and do updates which are seen at once - cache write-through - and integrate with authoring tools so all changes can be immediate
+ Use as the base for a GLOBAL DESKTOP uniting group of researchers.
- Make WWW be THE new desk top metaphor - but a shared metaphor
> Think of it as a replacement for the Macintosh desktop
- Integrate some or all of local file systems into WWW
> Code and data files for the project
- Integrate email and phone messages and faxes and newsgroups and... into WWW
- Provide modularly-oriented cross-platform integration and unification in ways that UNIX and Mac and Windows have not
+ Use as a browsable program repository - not just for text documents, for programs as well
- Single click down-load and/or execution
> Right, there are security problems - but not unique to WWW
- "Test-drive" of program modules via automatically-generated user interface
- Reference ARPA Image Understanding Program's code sharing tools
+ Synchronous WYSIWIS collaboration support
- All the standard things - video and audio connections, collaboration-aware viewers and editors, multi-way conferences, join and leave conference, etc.
+ Better support for bibliographic searches
- Ranked lists of "hits" much too primitive
- Text retrieval community has various visual representations - let's get them integrated in!
- Citation problem - URL's keep changing, need something else
+ An organized repository structure - "Information Clearinghouses"
- For project archives, courseware archives, etc.
- More structure than pure chaos
- Less structure (no bureaucracy please) than a library
> A `meta system' for creating repositories/archives?
2. Recommendations concerning research which NSF in general and IRIS in particular should consider undertaking with respect to the WWW, its accessibility, and its usability.
+ Everything discussed under topic 1 above
+ Explicit semantic descriptors - meta-information
- Facilitate browsing and searching
- Make easy and/or `cool' and/or valuable, else won't happen
+ More sophisticated browsers
- Information visualization and abstraction tools for the "hyper" network
- Support visually-impaired users with sound
- Speech-recognition using current set of link names as vocabulary
- Sound as an additional `visualization' modality
- Scalable down to PDA's and touch-tone phones and set-top boxes
- Better management of hotlists - get long, need to be organized
- In-line support for all data types and for programs
+ Teaching-oriented browsers
- Pedagogically-informed templates or other guidance
- Support for exploration of `micro-worlds' (Solloway, Guzdial)
- Support algorithm animations (Stasko) and two-way general
communications between script and the animation or simulation process
- Special CSCW needs for education (reference Levenson's system at Sun Microsystems).
+ Embedded active objects - copying a link takes the link's target reference with it, etc.
+ Intelligent agents
- Filter mail
- Automatically update email addresses and URL's
- Continually search for items matching my search profile
> Resource discovery agents
+ Multiple "looks" to the same information
- So my pages have the `NSF' look if accessed via the NSF PI directory and have the `GVU' look if accessed via the GVU home page
> A natural for SGML
+ Quickly move away from HTML as "The FORTRAN of Hyper-media Systems"
- The language you love to hate
- The language with which we have to be backwards compatible
- The language which made it all possible
3. Recommendations on NSF information delivery to the public and research communities via WWW. Attention will be given to the potential of the National Information Infrastructure to enhance access to NSF information.
+ Projects - Impose as little burden as possible on PIs
- But, some standards are needed
- Provide a template, filled in automatically based on award information, for each project (Ref ISX System)
> PIs
> abstract
> List of other NSF-funded projects of each PI
> Amount of award
> Other standard stuff
- PI provides URL for
> personal home page
> optional lab home page
> department home page
> institution home page
- Lists of publications and programs
+ Programs - everything on-line
- Proposal submission via WWW just as can now submit via ftp
+ General info for public
- General things
- Art gallery of interesting scientific pictures and animations
- Some sort of math and science education material and repositories
At present, Mosaic/WWW does not support bidirectional linkage to external packages and tools. In their August 1994 Science paper, Bruce Schatz and Joe Hardin discussed the component and the object models. Mosaic/WWW has a component model. Basically, Mosaic/WWW needs to be evolved to an object model for applications. An object model will be also useful to implement various interface agents for application-to-application, appliance-to-application, and application-to-network interfacing on WWW.
- encourage use of caching in the networks
- encourage improved protocols that reduce traffic and delays
- encourage better integration of indexing and search technology
* See that Computer Science research results that relate directly to the WWW (e.g., in AI, digital libraries, electronic publishing, HCI, hypertext, information retrieval, multimedia, HCI) can easily find their way into WWW:
- so that current problems of WWW can be corrected in time
- so that research results can more quickly work through the technology transfer cycle, and "prove themselves" in real use
* Use WWW in whatever ways it can be applied (e.g., as suggested invarious of the position statements) to save the time of NSF staff as well as of those who prepare, review, and carry out research projects --- to make NSF efforts more efficient (with no loss in quality), e.g.:
- provide "templates" and on-line forms to help simplify and organize all data entry, submissions, reviews and other information handling
- have on-line searchable and browsable versions of all procedures and announcements, databases of PIs/projects/reviewers/panelists,...
- have on-line "frequently asked question" and other explanations to help staff and outsiders more quickly find answers to common questions
* Use WWW to help with rapid dissemination of CS research findings:
- so we have more "reuse" of our science
- so findings impact as wide a variety of areas as possible by:
- having sets of pages for each funded project, with similar structure, so one can easily find the important parts of each
- having all reports and papers related to the project be on-line and directly accessible (possibly inside a TR or publisher's WWW repository, or, at worst, as bibliographic citation plus abstract) --- this calls for liaison with CS technical report, CS digital library, and CS thesis/dissertation projects
- having on-line demonstrations, visualizations, simulations, animations, movies, code archives, documentation, and other related results
- having multiple views, for differing levels of expertise/education, e.g., K-12, undergrad, grad, researcher, layman, patent lawyer
- having indexing, classification, clustering, summarizing, abstracting, and other tools or meta-information to aid access
* Researchers who for decades have worked on the component technologies that make WWW a reality (e.g., EP [electronic publishing], HT [hypertext], IR [information retrieval], MM [multimedia], networking, client/server computing, PCs/workstations,...) are all eager to apply their specialized knowledge and skills to improve it further.
* Researchers who have integrated these technologies into various types of systems, services and environments, are now eager to move us from today's WWW toward a global digital library, of grand scale, in terms of content, audience, and use.
* Today's "Nintendo" generation is finding WWW to be the next target for its pursuit of edutainment, at the same time that teachers are turning to it as the host for new courseware, businesses are looking at it as a way to contact customers "without the middleman", and scholars are looking toward it as a unified and collaborative intellectual workspace.
- Testimonial 1: Tim Berners-Lee spoke on WWW, Rob Akscyn (then chair of ACM SIGLINK) talked about hypertext, and I spoke about our NSF funded work on digital libraries in March 1993 at on-line Publishing `93, Pittsburgh, PA; immediately afterwards I passed around Tim's WWW notes and brought up some of the early Web software.
- Testimonial 2: I was greatly pleased when a large group from NCSA came to the July 1, 1993 Workshop on Information Access and the Networks, IANET'93, held in conjunction with ACM SIGIR `93, in Pittsburgh --- and reported on Mosaic.
- Testimonial 3: Kurt Maly, Alan Selman, Jim French and I have coordinated work since early 1993 on the NSF funded <A HREF="http://www.cs.odu.edu/WATERS/WATERS-GS.html"> WATERS project</A> for CS technical report capture, storage and dissemination, that uses WWW and Mosaic for browsing, query entry, and presentation of search results and reports. The <A HREF="http://cs.indiana.edu/cstr/search">Indiana TR</A> project has shorter term goals. Integration with the ARPA CSTR effort (e.g., <a HREF="http://cs-tr.cs.cornell.edu/TR/CORNELLCS:TR94-1418"> DIENST</A>) has recently been approved by ARPA, and will be enabled by the facilities of WWW.
- Testimonial 4: My Fall 1993 Information Storage and Retrieval course made extensive use of gopher and Mosaic, using servers that have been running continuously since then; in Fall 1994 my department has three "paperless" courses, with all instruction and courseware delivered through WWW, and at least 9 computers running WWW servers.
* However, there is plenty of room for NSF-funded research to help ensure that WWW and Mosaic lead to even better systems, services, and tools:
- The Hyper-G system, funded by the Austrians, has many technical advantages over the commonly used WWW and Mosaic services, e.g.,:
+ viewers for text, images, and movies that are sensitive to links and anchors inside "documents";
+ indexing automatically applied when documents are added so that searching is integrated with browsing;
+ hierarchical browsing option for Hyper-G portions of the WWW;
+ caching by local servers that eliminates the need for clients to repeatedly connect to distant servers;
+ "live" mode that gives progressive display of images and movies to facilitate browsing and that allows cancelling of slow "get" operations;
+ multilingual document and interface support;
+ presentation of any SGML document (as opposed to just HTML documents);
+ searching on collections or subcollections, and on content;
+ a distributed OODB for links and metadata;
+ support for personal or group views or "webs" that allow multiple different sets of links "above" a document collection, even links between anchors that are added to read-only document pairs, through the link database; and
+ notification and automatic removal of a link from the link database when either the source or target document is deleted.
- MIME, which was planned as an interim solution to multimedia interchange, should eventually be replaced with a more efficient standard, with better integrated compression, that would replace it in WWW and Mosaic use.
- Full support is needed for SGML (for descriptive markup that reduces the cognitive load on authors); DSSSL (for specifications on how to present or print SGML-encoded documents); HyTime (for more comprehensive description of hypertext, hypermedia, and time-based documents); and MHEG (for object-oriented description of multimedia, hypermedia, and interactions) --- to facilitate interchange and electronic (re-)publishing.
- KMS (a hypertext system, provided by Rob Akscyn's company) type support is needed for: fast and efficient collaborative editing of hypertext documents, draw operations, line art, rapid client/server protocol, and easy adding of annotations.
- Multimedia presentation support is needed, and should move to constraint and style-guide based schemes; content-based indexing and analysis of multimedia documents (e.g., images, movies, speech, music) is also of great importance.
- Advanced retrieval techniques should be applied more to the WWW, e.g., with (SGML structure) context-dependent searching, use of extended Boolean schemes, morphological analysis of document and query terms, automatic query expansion using a lexicon or thesaurus, clustering to aid in browsing and retrieval, session-long interactive query improvement through relevance feedback, use of knowledge bases to allow inferencing and conceptual retrieval, etc.
- IR based models of user-intermediary interactions should drive how systems carry out interactive sessions, and could inform the behavior of intelligent retrieval systems.
- Use of general purpose knowledge representation and protocol standards (KQML, KIF) should be made in agent-based systems.
- Tighter integration is needed with other parts of users' environments, such as authoring languages for courseware, scripting languages that specify interactions, mail handlers (that could allow automatic filtering, routing, classification, filing and retrieval), expert systems / decision support systems, help and documentation systems, calendar / tickler systems, agenda and outlining tools, citation and content searching, etc.
* NSF-funded research in this area might follow these strategic guidelines:
- Integration of WWW-related efforts with Digital Library work should be strongly encouraged.
- Cross-fertilization of ideas should be encouraged through:support of interdisciplinary workshops and preference given to broadly skilled project teams or coordination between projects.
- Scalability, usability, efficiency, and effectiveness should be required as design criteria for most of the funded efforts, and should be the basis for obligatory evaluation phases of such studies.
- Studies of users, as they learn anew how to write, read, learn, and work in the WWW environment, must be undertaken before we loose the chance to collect data about them --- and results should be fed into design and development efforts.
- CS researchers should be encouraged to work with other NSF-funded investigators to create "environments", building upon the current WWW, for collaborative use, such as for: scientific computation, engineering design and development, learning, courseware development, multimedia programming, or proposal preparation/review --- with on-demand access to needed information, tool integration, multiple views of design/development/use spaces, and as-needed conversions between data/information/knowledge representations.
- While expanding the functionality of our emerging universal interfaces, we must also go back to tailored and crafted specialized interfaces, drawing ideas from them and fitting those back into WWW.
- A long-term perspective should be adopted, so
+ we understand the relation of network and system architectures to the need for scaling up in size of content and number of users;
+ we think not only about the idea of agents but also about other means to unleash the tremendous computational power of the Internet; and
+ our research and experimentation is aimed toward task support that cuts out the middle man and empowers users.
To support laboratory sciences, provide access via WWW to scarce laboratory equipment. E.g., see recent efforts by Sandia and Univ. Penn. to provide access via the WWW to their robotic equipment.
More generally, opportunity to share software, laboratory hardware, data sets, simulations. But for this to catch on, sharing these must be as simple as getting to a new URL via Mosaic.
NSF should require WWW pages for every NSF-funded project, indexed from the NSF home page. Perhaps NSF should support on-line journals. I don't know how to best do this... Publishers to date have been impediments, not advocates.
A. Support of Computer Science Basic Research
1. Attracting Researchers
2. Growing and Equipping Young Researchers
3. Access and Distribution Between Research Enclaves
4. Access to Network Resources for Research
5. Collaboration on National and Global Projects
B. NSF Information Distribution
1. To Enable the Computer Science Research Communities
2. Among Government Agencies
3. Domestic Industry, Education, Medicine, etc.
4. Public Information Delivery
5. Global Information Access and Delivery
C. R&D with Industry and National Infrastructure
1. Coordinating Research Agendas with Industry Needs
2. Attracting and Building Industry Collaborations
3. Strengthening National Professional Associations
4. Evolving Network Publications
5. Access and Aggregation of Network Databases
6. Technology for NII and GII Applications
D. New Research Agenda Issues
1. Access, Searching and Filtering Issues
2. Dissemination, Quality Issues
3. Strategies for Managing Massive Input Volumes
4. Issues TBD in PI Workshop
II. IRIS Division WWW Research Program Planning
A. PI Workshop - Oct. 31, 1994
B. Possible Future Workshops on Selected Topics
C. Possible Announcements of Research Opportunities
III. "NII Roadmap" - National Research Council
A. Considering Mosaic for Announcements
1. White Paper Inputs from Industry, Academia, Government
2. Public Forum May / June 1994
B. Cannot Handle Mosaic Forms Input Information
1. Too much to collect and to reply to
2. Too varied or inappropriate
3. Few or no classification structures and mechanisms
IV. Distributed WWW Server Project
A. Program Director's Desktop Capability
1. Design and Management of Program Mosaic Pages
2. Dynamic Assembly of Most Recent Information
3. Automatic Enforcement of Policies:
Privacy, Publication, Distribution
4. Reduce Program Staff Workload and Timeliness
5. Use for negotiating joint funding of proposals
6. A small demo being developed internally:
Off-the-Shelf Approach:
Mosaic + FoxPro + Microsoft SQL Server
B. Use of "Forms" for PI Inputs to the Program, e.g.:
1. Abstracts (One-liners for lists, Summary of Awards,
Detailed for Peers)
2. Reports on Grants
3. Key words for dissemination and access
4. Adding & Changing Links to Grantee WWW Server
C. Support for the Research Community -- Some Topics:
1. Scope and Style of Program Research
a) Program Description and its Elements
b) Elements of a proposal targeted for this Program
c) How to do Science
d) Description of Program Director's Philosophy
2. Announcements
a) "What's New" Pages
b) Proposal Opportunities
c) Summary of Awards from NSF Servers
d) Infrastructure Resource Announcements
3. Links to Resources and Related Research
a) Links to programs and agencies that do joint funding
b) Links to Grantee's WWW Servers
(1) Workshop Reports where Grantee is the PI
(2) Grantee's Detailed Research Descriptions
(3) Bibliographies, Abstracts and Cross-Indices
(4) Network-accessible tools on-line experiences
(5) Curriculum Models and Issues
c) Links to National and Global Resources and Studies
d) Links to Basic Science Bibliographical Resources
D. Suggestions for Content of NSF IRIS Mosaic Pages?
Several basic deficiencies of the present WWW system contribute to the overloading of information servers. Information objects are always retrieved in full before the user gets a chance to peruse it and decide whether he or she really wants it. For large objects such as digitized photographs, this method wastes considerable network bandwidth in addition to forcing the user to wait the entire duration of object transfer before any useful information about the object is made known. The lack of replication among most information servers further compounds the overloading caused by needless full object retrievals.
Incorporating multiresolution retrieval as the fundamental method of object transfer will be a significant enhancement to the infrastructure of the Web. More research needs to be done on the use of intelligent agents at server sites to manage multiresolution, replication and consistency. Each information object published by a server may be defined at various resolutions according to its data type. For a digitized photograph, a low resolution version may have several adjacent pixels grouped together and assigned an average color value. For a large Postscript document, a low resolution copy may consist of the abstract and the index. Depending on the typical load on the server and the storage requirements, the agent at the server site may decide to store one or more of the multiresolution versions and generate the others on demand. In the default mode of object transfer, a low resolution copy of the object is retrieved and displayed to the user while the full or higher resolution copy is being transferred. The requesting user is thus shown a version of the object promptly and can decide whether to continue with the retrieval or to abort the request.
The agents may also be programmed to construct customized low resolution versions of objects. For example, instead of a user retrieving an entire collection of documents only to build an index for future reference, the active agent at the site serving the documents may be asked by the user to construct the index on the fly according to the specifications supplied by the user. The index may then be shipped across the network instead of the full set of documents.
The ability to retrieve predefined and customized multiresolution versions will result in considerable savings as increasing numbers of large multimedia objects are made available over the Web: the extracted content of such an object is likely to be a few orders of magnitude smaller than the object itself. The positive results of multiresolution support include better response time behavior to users and the potential for reducing load placed on the system by browsing users and automated index generation routines (that are typically used to generate and maintain the index used by search engines).
Automated replication management is another important key to achieving scalability. Agent and mediator technologies may be employed at server sites to monitor the load, manage replication and forward user requests transparently to the replicas. More research work needs to be done on incorporating an effective consistency management scheme under multiresolution retrieval.
Significant improvements are also required in the traversal tools (such as xmosaic) to make the process of information discovery less daunting to the average user. The focus in user interface design should be on supporting multiresolution retrieval and display and on providing tools for the user to construct customized high level views of the information space. The information displayed to the user should be incrementally refined as more data is retrieved and components are received at progressively higher resolutions. This change in the user interface will noticeably reduce the `dead time' spent by users as they wait for network connections to be made and the requested objects to be transferred.
Customized views of the information space are a natural extension of the unscalable `Hotlist' feature currently available on most viewers. Tools to automate the process of building views will help make it more scalable with respect to the rate of generation of new data. Agents may be deployed at the user's end to scan the information that has newly become available from a set of given sources and selectively incorporate it into the user's personal view according to a specified criterion.
Finally, more research needs to be done in the design of effective search engines that scale well and are comprehensive. Discovery by navigation is impractical for all but the smallest of information spaces. A variable-resolution search mechanism that permits successive refinement and navigation of search results may provide the general search facility. In addition, specialized topic specific search engines may be provided to the user to attach at appropriate points in the customized view.
Active papers will provide the readers with possibility of interacting with the content of the paper using the interface provided by the author (in the same way as she now provides the table of contents). In this way, each paper becomes a "database" which can be queried and even, in some cases, updated. As in databases, different views can be defined for the same paper: for example one can create "presentation views" for talks of different length and depth including high level (possibly visual) abstracts of "what the paper is about?"
Below we present some of the features offered by active papers:
Reading (visualizing) the paper will now be a complex and expensive computational process which may involve remote accesses to remote servers (remote appendix, mathlab servers etc). Papers may be "read" either on the client machine, if it has enough computational power, or at the server's machine through remote access. The process will also require caching and monitoring of the future changes.
It is also important to provide ways for shared access to the active papers. Ad hoc interest groups can be formed so the future updates to the paper's content can be disseminated to the interested parties. Active papers are never finished and continuously evolve, therefore interested parties should ne able to monitor such evolution.
NSF has a unique opportunity to lead efforts in this direction.
Though the web may not be maintained as a set of relations in a relational database, an experienced SQL user may wish to ask queries of this kind. Translation mechanisms that convert such queries into a search of a path(s) on the web are critically needed. This is true not only for SQL, but also for a wider variety of query languages such as natural language, predicate logic, and/or query-by-example.
Even more importantly, a user may wish to express a conjunctive query where part of the query is expressed, say, in SQL, and part of the query may be expressed, say, in a different language. For instance, if the user has, in one window, a picture (containing, amongst others, an individual I who is an author), the user may wish to ask, in some formalism, a query of the form: ``Find a copy of a book written by the wife of the person in the picture around whose face, I am drawing a box". Here, the user draws, with his/her mouse (say), a box around the face of a person, and expresses the rest of his/her query in, say, an SQL like language. What constitutes a ``kosher" way of merging different querying paradigms and query languages together? How do we denote formally, the fact that different parts of the query are expressed in different media (picture, text), yet they share a common part? How are such queries to be processed (multimedia join)? These are critical scientific issues that need to be addressed.
Table of Contents
Next Chapter