Digital library, users will be able to access any part of a primary source. In retrieval and user interface, it will be necessary to handle various relationships between multimedia data and to easily and freely identify and utilize those relationships. ``Information strolling'' means that users walk around in the information space with knowledge of and control over the relationships between data.
In order to realize ``information strolling,'' we have developed three interface systems. (1) The ``virtual stacks and CG librarian'' interface can visually handle and guide spatial relationships using computer graphics. (2) The ``associated data retrieval'' interface can handle conceptual relationships with multimodal dialogue understanding. It is implemented with natural language understanding, voice recognition, context understanding, and a hierarchical structured data (HSD) model. (3) ``WWW based interface'' can handle other miscellaneous relationships with global linking navigation and retrieval for relational databases of multimedia data through the CGI of WWW architecture.
In this paper, we summarize the technical issues of user interfaces for a digital library, describe details of these three interfaces, and discuss the issues of retrieval and user interface methods for a digital library.
This interface ,show in Fig. 1, has two features: (1) a three dimensional interface space called virtual stacks [4], where a user can freely walk around through realtime computer graphics generation; (2) a personified interface called ``CG librarian'' [5], that guides users through the virtual stacks and assists them in data retrieval. It is generated by a CG engine and can simulate body language and synthesize mouth motion including voice output.
Fig. 2 shows the system structure. The system consists of a graphics workstation (SGI IRIS Crimson/RE) to generate virtual stacks and CG librarian images and a PC (NEC PC-9821) for voice recognition and synthesis. A HDTV monitor is used to display CG images.
Even novice computer users can easily use a visual interface like this to retrieve information retrieval in a digital library. This interface supports not only for a person who has a clear goal such as finding a book that has a specific title or searchs a card catalogs or OPAC (Online Public Access Catalog) system, but also a person who does not have a clear goal and is wandering about in a library or a book store looking for some new or interesting information.
Users can freely walk around, choose a book or a video, check a catalog data, and see the contents of a book or watch MPEG video data.
Additionally, users can possess virtual stacks, in which they can change the arrangement of objects to suit their taste. For example, books that the user previously checked out can be stacked on % to a personal bookshelf, or specific kind books searched in advance by keywords can be moved to a specific bookshelf.
In the future, users may have the system in an office, and they can consult information gathered by others by checking a colleague's virtual bookshelf. This system can use a method for social filtering [19,20] to get reliable and useful information from information space.
When using a interface that allows users to freely wander about an information space, users often lose their position; this is called ``disorientation problem.'' To deal with this problem, the system support voice recognition and synthesis function by a CG librarian everywhere in library, and user can direct the system to guide to a shelf he/she want to go.
The system consists of three parts: dialogue control unit; human motion generator on the GWS; and speech recognition and synthesis unit on the PC. Speech recognition and synthesis are implemented by using a commercial package software and communicate with the dialogue control unit using a serial line. the dialogue control unit controls the timing of the CG librarian's motion and speech and sends recognized text to a main control unit in the information strolling system. The human motion generator generates human motion by a key frame method. It has a motion pattern table, and when the control unit indicates a motion pattern such as nodding, waving a hand, it produces key frames for body, hand and mouse by consulting the table and image sequences between key-frames are then automatically produced.
CG librarian provides system help function by talking with the user, navigating through the shelves, and assisting in the search of a shelf. When a user is walking around and comes near a shelf, CG librarian can verbally give the user information about that shelf. When the user indicates the classification of interesting books, CG librarian can direct the user to the proper shelf.
Fig. 3 shows the system structure. The system consists of an engineering workstation (NEC EWS-4800/360) to manage the multimedia database system and synergistically understand the multimodal input, a PC (NEC PC-9821) for voice recognition, and a laser disk player (SONY LVR-3000N) for storage of dynamic image data.
A voice input including anaphora that expresses objects using this or it is analyzed and coverted to a semantic structure. According to both the structure adjacent to the anaphora and the HSD, the object that the user wants to point to can be determined and the anaphora can be resolved.
For example, Fig. 4 shows a scene of the associated data retrieval.
First, we describe the interaction sequence prior to this situation.
A user who want to know about Kakegawa City inputs a natural language
query such as ``掛川について知りたい''(I want to know about Kakegawa).
The system displays a list of titles obtained through analyzing the
input. The user selects 掛川市(Kakegawa City). The system displays
the book cover and index information. The user can read the content of
the book on the screen.
At this time the user may want to read something new about the same subject. Using a finger and pointing to a title the user can say ``そ の中で最近の本は?''(Which books were recently published ?). The HSD for all objects displayed on screen is the domain surrounded by a dotted line on Fig. 5.
First, to determine the logical objects indicated by the user's
finger, the system focuses on the domain from all HSD objects to HSD
objects included within the shadowed zone on Fig. 5. Secondly, by
analyzing the voice input, the system can determine that the target
objects are the book titles in the current HSD. Since the meaning of
the word recently is still not clear, the system asks the
question,``Recently means since 1990, OK ?'' In this way, the user can
specify a conceptual relationship such as recently published from the
object that was indicated.
(1) The WWW architecture has the following three features suitable for a digital library. (a) without installing another new browser, anyone can use the service with a WWW browser, which has become popular throughout the world in recent years. (b) The WWW server can distribute various kinds of multimedia data and the WWW browser can display it. (c) It is very easy to construct an original database service with an individual retrieval engine through CGI.
(2) Scoring the result and retrieval through feedback
This function handles the information strolling through similarity between documents and is based on relevant feedback (See Fig. 6). WAIS provides a simple and quick scoring and feedback retrieval method. But such simplicity often fails to retrieve proper data for requests of Japanese documents. In order to raise accuracy, the length of each text in which key words exit is used when calculating the propriety of each document for input key words. Whenever adding the text to the target relational database, the sentences or phrases included in individual columns are analyzed, filtered, and converted to an index word table. The index word table consists of 5 valus: word, record identification, table name, column name, and word wieght. The word weight (W) is calculated by the next equation: W = ( F * R * T ) / ( L * N ). (R, T: constants depending on the column including the word and thesaurus used, respectively. F: word frequency. L: the length of the text including the word. N: the number of records included in the column.) Because the significance of each term increases with text length (L) including the term become shorter, the term frequency in each text data increases, and the term frequency in the target database decreases. Using the index word table, the score of each record that satisfies the demand is calculated and indicated the records of scores that surpass the threshold are indicated. Likewise, using the table and results checked by the user, the feedback score is calculated and indicated by the algorithm described in [6].
(3) visualization as a graph with automatically selected axes:
This system can visualize the retrieved data as a graph with GIF icons of various length. In order to visualize data, a system hasa to select the axes that map the retrieved data to the each position of display. In this system, the user can freely and explicitly select those axes. If the user does not explicitly select the axes, the system can automatically select the axes by applying the rule suitable for the retrieval situation.
(4) Understanding ambiguous retrieval input using a thesaurus
Key words in a request are so varied that they are not always included in the target database at one time. To solve this problem, each conventional library prepares thesauruses that comprise a full set of key words included in the library. This interface uses a large common thesauru - Conceptual Dictionaries of EDR,[8,9]). It expands keywords, confirms with the user, and retrieves data under these broadened conditions.
(5) Useful retrieval methods
This interface also has three other useful retrieval methods: (a) Japanese request retrieval; (b) full-text search for text data; (c) and native language retrieval by translating both input keywords and retrieved results using a commercial machine translation system, PIVOT-JE/EJ[10], NEC.
During research and development of CG librarian, we classified body motion by emblems that [14] proposed. [15] also concentrated on the relationship between dialogue content and body motion. This work focuses on how to illustrate direction and position and represent shape by hand gestures in a dialogue: the roles of motion are emblem and illustration. Our CG librarian focuses on emphasizing verbal language by body motion, relation between dialogue situation and respective motion pattern, and the artistic aspect of motion.
Based on associated data retrieval, we proposed a method of resolving anaphora with multi-modal context and HSD models. [21] proposed one method that made hierarchical structured data from image data through a iterative cycle of separating images, hierarchically organizing their domains, and naming the domain components of images by human recognition ability. The data construction method is very useful. Although our HSD model is not limited to only image objects, we will perharps use closely related methods.
Based on WWW based interface, work on relevance feedback interface [22,23,24] has been proposed where the effective relevance feedback algorithm is based on a vector model. We extended the method to calculate the weight of each keyword using value length parameters and column type paramenters.``Information Visualizer'' [17,18] proposed various visualization methods of hierarchical structure information, reference information, and complex information. Our WWW based interface proposed a method to select axes automatically through visualizing on a graph. [19] and [20] proposed a method where persons that have close preferences exchange information; it is based on filling out a questionnaire. In our virtual stacks, a user can freely arrange the books that the user selected on virtual personal bookshelf. Another user that has close preferences can get the desired documents using the user's virtual personal bookshelf. In this way, we can implement one of the social filtering methods.
[1] E. A. Fox, R. M. Akscyn, R. K. Furuta, and J. J. Leggett. Digital Libraries. Communications of the ACM,Vol. 38, No. 4, (April 1995), pp. 23-28. [2] R. Rao, J. O. Pederson, et al. Rich Interaction in the Digital Library. Communications of the ACM 38, 4 (April 1995), pp. 29-39. [3] S. Gauch, R. Aust, et al. The Digital Video Library System: Vision and Design. Proceedings of the 1st Annual Conference on the Theory and Practice of Digital Libraries: Digital Libraries '94, (Texas A&M Univ., College Station, TX, June 1994). [4] T. Kamiya, S. Lu, M. Hara, and H. Miyai. Development of Electronic Library Interface with 3D Walk-through and CG Librarian (in Japanese). IPSJ SIG Notes Vol. 95, No. 1, pp. 27-34, 1995. [5] S. Lu, S. Yoshizaka, et al. A Human Computer Dialogue Agent with Body Gesture, Hand Motion, and Speech. HCI International '95, Tokyo, 1995. (in printing) [6] S. E. Robertson, K. Sparck Jones. Relevance Weighting of Search Terms. Journal of the American Society for Information Science, May-June (1976). [7] M. Tani, H. Shiba, and S. Ichiyama. Voice Operation for Multi Window Systems (in Japanese). IPSJ SIG Notes, Vol. 94, No. 98, pp. 143-150,1994. [8] A. Koizumi, M. Arioka, et al. Noun Phrasal Entries in the EDR English Word Dictionary. Proceedings of COLING '94, August 1994. [9] Japan Electronic Dictionary Research Institute, Ltd. Electronic Dictionary Technical Guide. TR-042. Ch. 5. 1994. [10] M. Miura, M. Hirata, and N. Hoshino. Learning Mechanism in Machine Translation System ``PIVOT''. Proceedings of COLING '92, August 1992. [11] M. Sato. Electronic Libraries ``SongWoKung'' which support retrievals of books using presentation of CG pictures of libraries (in Japanese). IPSJ SIG Notes, Fi-24-2, Nov. 1991. [12] M. Sato. Open-shelf Inspection Service on Telematic Libraries. Fourth IWT, May 1988, Caen France. [13] M. Sato. Visual Environment for Telematic Library Users. Fifth IWT, Sept. 1989, Denver, CO. [14] P. Ekman and Friesen. ``Three classes of nonverbal behavior, Aspects of Nonverbal Communication.'' Swets and Zeitlinger, 1980. [15] J. Cassell, C. Pelachaud, et al. ``ANIMATED CONVERSATION: Rule-based Generation of Facial Expression, Gesture & Spoken Intonation for Multiple Conversational Agents.'' Proc. of SIGGRAPH '94, pp. 413-420, 1994. [16] R. Rao, J. O. Pedersen, M. A. Hearst, J. D. Mackinlay, S. K. Card, L. Masinter,P. K. Halvorsen, and G. G. Robertson. ``Rich Interaction in the Digital Library'', Communications of the ACM, vol. 38, No. 4, pp. 29-39, Apr. 1995. [17] J.D. Mackinlay, R. Rao, and S. K. Card, ``An Organic User Interface for Searching Citation Links''. CHI '95 Conference Proceedings, pp. 202-209, May 1995. [18] J. Lamping, R. Rao, and P. Pirolli. ``A Focus+Context Technique Based on Hyperbolic Geometry for Visualizing Large Hierarchies''. CHI '95 Conference Proceedings, pp. 67-73, May 1995. [19] U. Shardanand, and P. Maes. ``Social Information Filtering: Algorithms for Automating ``Word of Mouth.'' CHI '95 Conference Proceedings, pp. 210-217, May 1995. [20] W. Hill, L. Stead, M. Rosenstein, and G. Furnas. ``Recommending and Evaluating Choices in a Virtual Community of Use''. CHI'95 Conference Proceedings, pp. 194-201, May 1995. [21] H. S. Shakir. A Universal Image Database System for Pictorial Information and Meta-Information Management, PhD Thesis, Kyoto University, Nov. 1994. [22] E. A. Fox, D. Hix, L. Nowell, D. Brueni, W. Wake, L. Heath, and D. Rao. ``Users, user interfaces, and objects: Envision, a digital library.'' J. Amer. Soc. Info. Sci. Vo. 44, No. 8, Sept. 1993, pp. 480-491. [23] E. A. Fox, R. K. France, E. Sahle, A. Daoud, and B. E. Cline. ``Development of a modern OPAC:From REVTOLC to MARIAN.'' In Proceedings of the 16th Annual International ACM SIGIR Conference on RD in Information Retrieval., (AMC, New York, 1993) pp. 248-259. [24] G. Salton, E. A. Fox, and H. Wu. ``Extended Boolean information retrieval.'' Communications of the ACM, 26(11):1022-1036, Nov. 1983.