Table of Contents Previous Chapter

Appendix E - External Position Papers

Table of Contributors

Dr. Robert A. Amsler Part One

I see NSF's potential involvement with Mosaic/Web access to divide into several areas.

First, as with all Web sites, there will be a conversion of "Chamber of Commerce" information into HTML formats such that from now on NSF policy should dictate that all public information be produced using SGML/HTML compatible software so that it may be posted to the Web when "published". This can potentially both better serve the communities to whom this information is directed and perhaps even save money through eventual reductions in printing and mailing costs.

Apart from the existing information about NSF which currently inhabits the NSF Web site, I believe longer documents such as NSF and National Science Council reports should be considered for online access.

Second, the review process can benefit from both allowing documents to be submitted in HTML giving reviewers the capability to access both the original proposal and original accompanying suporting materials (e.g. papers published by the proposal submitters). Some special concerns here for privacy and confidentiality may need to be addressed so that if materials are located at the submitters home sites the access to them doesn't reveal the reviewers identities. Perhaps temporary relocation of materials to NSF for the time of the review could be made. This seems largely to be a logistics problem, since it is clear what materials are involved. I.e., original proposal, supporting published papers, biographical data on the proposers, and associated previous documents connected via the biographical data on the proposers or relating to previous work on the project.

Third, once proposals are funded, there is an opportunity for NSF to greatly increase the potential for interactions within the research community and technology transfer by encouraging the grantees to create homepage access to their project and its work. Thus, instead of merely receiving word back from the funded projects in the form of a periodic summary of their results, NSF can provide a jumping off point to each project.

Foutth, the issue now turns to how NSF can use the new capabilities to promote Science and Scientific Literacy (the new goal of the administration). The Web as it currently exists is powerful but unstructured. Unlike other information collections whose access is mediated by the efforts of the library and information science communities, the Web has just grown. This is good and bad. It provides for maximal experimentation, with low start-up costs for individuals everywhere, but it creates a hodge-podge of data, duplicated across many sites, and seems to encourage a geometrically growing effort to index the materials via the lowest common denominator of weighted keyword access.

Two generic factors could greatly improve the situation. One would be to encourage the creation of a universal protocol for access to publications, particularly journal articles. The URL gives us the ability to include a pointer in documents--but one has to discover a path to the pointer to use it. Within the scientific and scholarly literature there are somewhat common means of referring to authors publications within papers, such as SMITH&JONES-94 for a relatively unique identification of a paper by two authors, Smith and Jones, published in 1994. This scheme could be expanded to include site information and in effect create a virtual identifier for a paper which when entered into Mosaic could literally whisk the reader away to the paper online. So, if Smith and Jones worked at The University of Texas at Austin in the Computation Center, the reference might be something as simple as "cc.utexas.edu/docs/SMITH&JONES-94".

This is essentially a call to devise an extension to the Web addressing scheme specifically for publications in a manner which would allow creation of the hypothetical identifier for any publication worldwide by anyone seeking such a paper. This identifier could then be included in published papers and from its logical components readers could hypothesize additional papers identifiers and request them.

The second generic factor I see a need for is to promotethe assignment of the equivalent to CIP (Cataloguing in Publication) information for all pages of information on the Web. Some time back the library community recognized that distributing cataloguing information for books and journals was getting excessively expensive because every library had to explicitly list the documents they had received in a request to a common supplier of cataloguing information. Eventually it was converted into an electronic interaction, but still the requests had to be submitted. To eliminate this, the contents of the library catalog were generated by the Library of Congress upon receipt of a preliminary copy of the published materials and then printed in the book/journal itself when it was sent out. Today virtually all books contain a catalog card on their copyright page which gives the call numbers in the LC and DDC classifications, subject descriptors, and correct author/title/publisher details for the book. Anyone can tell from a given copy of a book where the subject matter of the book would be found in any library--thus facilitating access to the content matter in a larger context.

This is what the Web lacks. It has neither a global subject scheme in place (and the full classification scheme of the Library of Congress would probably be needed to encompass it today) nor any means for someone to know where a given node lies in the global context of nodes. Efforts to provide index access to text descriptions of sites and their offerings is ever falling behind the contents online.

Dr. Robert A. Amsler Part Two

The prior section outlined more or less ordinary steps that could be taken to exploit the Mosaic/Web connection by NSF. This message intends to discuss some much more advanced approaches, attempting to enlist the support of the research community itself to explore what can be done.

First, the Web as a structure seems to embody many of the properties of a Semantic Network, as long used in the computational linguistics field. Many properties of the Web are similar to those of semantic nets and hence many of the procedures applied in computational linguistics to semantic networks might be applicable to the Web itself.

Some differences do exist. For one, the Web doesn't employ "arc labels" to describe the nature of the node being connected to via a URL. An exploration of what such arc labels might be would be worthwhile and how they could be placed into an expanded protocol for this would be worthwhile. Minimally, for example, arcs ought to indicate the type of information at the node, such as the medium. Maximally, they should deal with the semantic relationship of the information to the current node. Additionally, semantic networks usually assume arc labels to have reverse interpretations and to be reverse accessible, thus the HAS-PART relationship is the reverse of the ISPART relationship (e.g. VIRGINIA HAS-PART ARLINGTON, ARLINGTON ISPART VIRGINIA).

In computational linguistics the relationships between nodes relate to linguistic properties of events and entities. ISA/HASTYPE (is a type of/has type), ISPART/HASPART (is a part of/has as a part), CAUSES/CAUSED-BY (causes/is caused by), and the case arguments of event description (AGENT=is the human agent of the event, THEME=is the main event/entity upon which the event operates, INSTRUMENT=is the means by which the event operates on the theme, SOURCE=is the thing from which the event initiates its action, GOAL=is the thing to which the event transforms or progresses its action, LOCATION=is the physical (geographic) location of the activity of the event, etc.)

On the Web these are at present just an atypical subset of the set of relationships between nodes. Also, the events are obscured because the current arc labels would be composed of an event and an entity, such as "WRITTEN-BY". Strictly speaking this should be a link such as THEME-OF to a WRITING event whose AGENT arc would the the author and which would itself have possible other links.

Among the tasks to which semantic networks are put is that of natural language generation, i.e. the generation of text based on starting at some node in the network and accessing the set of nodes connected to that point and interpreting what is found by levels of connection and a precedence of relationships, e.g., "This is the Home Page of the National Science Foundation created by Author X on January 19xx and directs the reader to other pages describing the major divisions of NSF and a staff phone/email directory, etc."

It would seem that being able to produce such natural language descriptions of the Web for any node could be useful. Note also that such descriptions vary dependent upon where one starts and that one can start at ANY node.

Next... One of the more powerful techniques for creating new sub-networks was exploited by Doug Lenat in the CYC Project at MCC. This was "copy and edit". It involved making an analogy between two arbitrarily complex entities which provided the computer with guidance as to what sub-structures to create and then permitting the raeder to EDIT these sub-structures to correct them for actual differences from the existing nodes. This technique would also work on the Web, such that one could, using such an interface creation tool literally say something as complex as "The NIH is a government agency like the NSF" and then expect the system to generate a copy of ALL the information and structures existing for the NSF and then allow the creator to populate it with corrected values for NIH.

Such a tool could greatly speed data entry for NSF-funded projects.

The other tasks which the interpretation of the Web as a semantic network of nodes with semantic arcs between them would facilitate is the intelligent use of the Web by software information agents seeking information. It would allow the programming of search strategies as rules and scripts to be followed, e.g. "Find an email address for a Dr. Kathleen Fisher, a psychologist at the University of California at Jim Davis" could translate into: "Find the name FISHER, KATHLEEN in the faculty/staff phone/email directory attached to the homepage for the University of California at Jim Davis." Alternative search strategies might be programmed employing knowledge of geography, etc.

The Web as a entity lacks world knowledge for intelligent agents to use in searching it. This is partly a matter of knowing more about the world in terms of geography and organization than currently exists on the Web itself. I.e., WE know what states contain what major cities, or how to use the ZIP code directory or the telephone area code listings to locate things. WE know that there are major branches of the USA federal government and that the Army is a part of the DoD. The question is how and where should such information exist in the Web for the use of intelligent agents? Facilitating access to the Web could be done if simplified function call interfaces to some of these resources could be created and in effect a listing of available functions which Web scripts could contain could be generated. If CITY-OF(ZIP-CODE) was a function, then calling it with a ZIP code would yield the city--handy for a program or a person. Since the data is online, all that is missing is the interface components to make these readily usable by people or programs.

Jim Davis

I would suggest that we do not yet have a problem of information overload with regard to electronic communication of work from NSF sponsored programs. It's true that anyone who relies on the electronic media quickly becomes overloaded, but that's because of the conversational, divergent, and undisciplined nature of email. Very little of high quality results of the kind produced by NSF research is accessible in electronic form.

In my view the problem is to get the research reports onto the net, where they will be immediately reachable by the global community, and such that information discovery and filtration systems can be brought to bear on them. I would suggest that it would be useful for the community to adopt some standard for electronic publishing of research reports. There are a number of suitable systems working now, among them are WATERS (from Old Dominion University and Virginia Tech) and the Dienst system (developed as part of the ARPA CSTR project.)

More information about WATERS can be obtained from this site. Dienst is available. I will also be presenting a paper on it at the upcoming WWW 94 conference.

Judith Klavans

I taught a course called Computing and the Humanities last spring at Columbia. Students (mostly undergrad) used WWW for their projects, linked all around the world for data and tools. It was a big success, very popular, and a challenge to teach (both engineering and humanities students).

John C. Mallery

Click here for the current version of the position paper.

Gregg Vanderheiden

Here are a couple of ideas:

Terry Winograd

Question:How might the research community supported by IRIS leverage on methods of access, search, and navigation in the WWW to improve communications amongst themselves, and with NSF?

There are two questions here: how to best make use of existing capabilities, and what new ideas could be developed to improve them. On the first, my general feeling is to be inclusive rather than restrictive -- every project should have a home page (which is listed on the IRIS-maintained directory page) and then should attach any materials they would normally put on line, preferably in HTML form, or at least linked to a page.

As to new capabilities, I think one of the most important is a systematic way of dealing with different levels of privacy. Every project has some working materials they will be happy to make fully pubic, some that are for trusted colleagues but not general publication, and some that are for internal use only. Good simple standards and ways of managing this will be of help to all.

Question:What should the IRIS research programs consider for NSF page content that would most help grow their investigative communities and be most useful in dissemination and retrieval of NSF-related science and technology information?

Easy access to publications and technical reports, pages that identify individual people and their interests so you can contact them, shared bibliographies.

Question: What issues are there regarding coordinating with information sources from the professional societies of the investigative communities that could be addressed via mosaic-like mechanisms, and what role should NSF play in such an activity?

The main issues are around intellectual property. The ACM, for example counts on income from publications and does not want to make them available without payment. The NSF could pay enough for the privilege of putting them online that the publisher would be willing to forego the income that would come from individual purchases. This is like the current practice of charging libraries more than individuals for journals, but on a larger scale.

Question:What format, linkage, or content guidelines, if any, should be provided PIs to whose pages NSF pages point in order to enhance the usefulness of the pages?

We need to develop ways of storing and managing multiple representations of documents . This is part of the work we will be doing in our digital libraries project. In the meantime it could be useful to have some "preferred" formats, but they should not be the only ones alllowed.

Question:Are there any special "viewers" that might be of particular help to the CISE and IRIS investigative communities, and how should they be developed and supported?

I don't know of any that are special to those commnities. It will be important to follow the development of viewers in academic and commercial areas.

Question:How can we deal with potential information overload in the research communities (e.g., intelligent agents to search/filter information for specific investigative communities) and what experimental research, if any, should be considered on these topics?

Again, this is part of the research being done in the digital libraries initiative. Additional funding would be helpful and should be coordinated with that effort.

Question:What research should be conducted in the area of user interfaces (viewers) to the WWW, in order to enhance the accessibility of the information? What is the role of information visualization techniques? Natural language? Speech? User adaptability? Handicapped users?

Same as above. This list looks like the general "Information superhighway" list of desires. There may be special needs associated with CISE and IRIS, but mostly it's a matter of trying to facilitate the larger research picture.

Question:How can we leverage on WWW (and NSF) for education and training of future researchers and PIs?

Encourage people to put teaching materials as well as research materials onto their web sites.
Table of Contents