User interfaces for information strolling on a digital library

Mikiya Tani, Toshiyuki Kamiya, and Shunji Ichiyama
Kansai C&C Labs. NEC Corp.
4-24, 1-chome, Shiromi, Chuou-ku, Osaka, 540, JAPAN
Phone: +81-6-945-3217
Fax: +81-6-945-3096
E-mail: {m-tani, kamiya, ichiyama}@obp.cl.nec.co.jp

abstract

This paper describes the issues involved in developing a digital library user interface and how to realize information strolling. It further reports our three user interface systems : virtual open stacks and CG librarian o simulates a conventional library; associated data retrieval to understand multimodal dialogue; and WWW based interface. With a digital library, users will be able to access any part of primary resources. In the user interface and retrieval areas, it will be necessary to handle various relationships between multimedia data and to easily and freely identify and utilize those relationships. Information strolling is designed to allow users to walk around in the information space with full knowledge of and control over the relationships between the data. keyword: virutal stacks, CG librarian, assocaited data retrieval, multimodal interface, WWW, user interface, digital library

1. Introduction

This paper describes the issues involved to develop user interfaces for a digital library and how to realize information strolling by using our three user interface systems currently under development: ``virtual stacks and CG librarian'' simulates a conventional library, ``associated data retrieval'' to understand multimodal dialogue, and ``WWW based interface''.

Digital library, users will be able to access any part of a primary source. In retrieval and user interface, it will be necessary to handle various relationships between multimedia data and to easily and freely identify and utilize those relationships. ``Information strolling'' means that users walk around in the information space with knowledge of and control over the relationships between data.

In order to realize ``information strolling,'' we have developed three interface systems. (1) The ``virtual stacks and CG librarian'' interface can visually handle and guide spatial relationships using computer graphics. (2) The ``associated data retrieval'' interface can handle conceptual relationships with multimodal dialogue understanding. It is implemented with natural language understanding, voice recognition, context understanding, and a hierarchical structured data (HSD) model. (3) ``WWW based interface'' can handle other miscellaneous relationships with global linking navigation and retrieval for relational databases of multimedia data through the CGI of WWW architecture.

In this paper, we summarize the technical issues of user interfaces for a digital library, describe details of these three interfaces, and discuss the issues of retrieval and user interface methods for a digital library.

2. Technical issues of retrieval and user interfaces for a digital library

We divided the concepts of our digital library into six areas: data input, processing and storage, retrieval, user interface, method of use, and circulation. Each area has many technical problems that need to be overcome [1]. However, we focus on two primary areas, retrieval and user interface, because the supply of information is increasing more rapidly than our ability to effectively search through this tremendous resource. To solve this dilemma before the Internet implodes, we must make fundamental advances to capture, store, search, filter, and display this information [2,3].

2.1. Retrieval method

A digital library is not only for researchers or educators who have definite subjects to search, but also for users who don't have clear search goals; that is, those who prefer to stroll in the library. Users with clear search goals expect a digital library to provide efficient search methods to find information using various relationships, ambiguous keywords, or native language words. On the other hand, users who don't have clear search goals may expect to retrieve data merely through simple selection.

2.2. User interface

In order to obtain the desired data, a user may expect to simply use a GUI interface or a metaphor that replicates a physical library by computer graphic technologies. However, a digital library provides services to so many kinds of users that it requires a specific user interface for each user class. This variety is the most important problem to resolve in developing user interfaces.

3. How to realize information strolling

We have developed three user interfaces for a digital library to realize ``Information Strolling''.

3.1. Navigation interface using CG technologies to handle spatial relationships

The goal of this interface is to realize information strolling through spatial relationships. This interface is intended for novice computer and library users. Therefore, it has to provide a familiar and simple user interface. Accordingly, the input methods of our interface are limited to input by mouse and voice recognition, thus avoiding the need for keyboard input. (In Japan, keyboard input is not as common as in other countries that have a ``typing culture'').

This interface ,show in Fig. 1, has two features: (1) a three dimensional interface space called virtual stacks [4], where a user can freely walk around through realtime computer graphics generation; (2) a personified interface called ``CG librarian'' [5], that guides users through the virtual stacks and assists them in data retrieval. It is generated by a CG engine and can simulate body language and synthesize mouth motion including voice output.

Fig. 2 shows the system structure. The system consists of a graphics workstation (SGI IRIS Crimson/RE) to generate virtual stacks and CG librarian images and a PC (NEC PC-9821) for voice recognition and synthesis. A HDTV monitor is used to display CG images.

(1) virtual stacks

This interface can build a virtual three-dimensional space that simulates a library room and bookshelves. Not only are the shape, color, and position of a shelf represented in the 3-D space, but the books, and videotapes on the shelves are rendered in the same shapes as the actual objects. A user can walk around through the virtual stacks by simply operating a mouse as if in a conventional library.

Even novice computer users can easily use a visual interface like this to retrieve information retrieval in a digital library. This interface supports not only for a person who has a clear goal such as finding a book that has a specific title or searchs a card catalogs or OPAC (Online Public Access Catalog) system, but also a person who does not have a clear goal and is wandering about in a library or a book store looking for some new or interesting information.

Users can freely walk around, choose a book or a video, check a catalog data, and see the contents of a book or watch MPEG video data.

Additionally, users can possess virtual stacks, in which they can change the arrangement of objects to suit their taste. For example, books that the user previously checked out can be stacked on % to a personal bookshelf, or specific kind books searched in advance by keywords can be moved to a specific bookshelf.

In the future, users may have the system in an office, and they can consult information gathered by others by checking a colleague's virtual bookshelf. This system can use a method for social filtering [19,20] to get reliable and useful information from information space.

When using a interface that allows users to freely wander about an information space, users often lose their position; this is called ``disorientation problem.'' To deal with this problem, the system support voice recognition and synthesis function by a CG librarian everywhere in library, and user can direct the system to guide to a shelf he/she want to go.

(2) CG librarian

To realize a familiar interface, CG librarian is implemented as an anthropomorphic dialogue agent with human-like motion., enabling us to communicate in both verbal and nonverbal ways. Various body, head, and hand motions are controlled by a database extracted through analyzing actual human behavior.

The system consists of three parts: dialogue control unit; human motion generator on the GWS; and speech recognition and synthesis unit on the PC. Speech recognition and synthesis are implemented by using a commercial package software and communicate with the dialogue control unit using a serial line. the dialogue control unit controls the timing of the CG librarian's motion and speech and sends recognized text to a main control unit in the information strolling system. The human motion generator generates human motion by a key frame method. It has a motion pattern table, and when the control unit indicates a motion pattern such as nodding, waving a hand, it produces key frames for body, hand and mouse by consulting the table and image sequences between key-frames are then automatically produced.

CG librarian provides system help function by talking with the user, navigating through the shelves, and assisting in the search of a shelf. When a user is walking around and comes near a shelf, CG librarian can verbally give the user information about that shelf. When the user indicates the classification of interesting books, CG librarian can direct the user to the proper shelf.

3.2. Retrieval Interface through Multimodal dialogue to handle conceptual relationships

The goal of the interface is to realize associated data retrieval that understands natural language and multimodal dialogue [7]. It is intended not only for the novice computer and library user, but also for expert users. Toward this goal, this interface has three features: (1) Japanese query understanding, (2) multimodal input understanding, (3) associated data retrieval using various relationships.

Fig. 3 shows the system structure. The system consists of an engineering workstation (NEC EWS-4800/360) to manage the multimedia database system and synergistically understand the multimodal input, a PC (NEC PC-9821) for voice recognition, and a laser disk player (SONY LVR-3000N) for storage of dynamic image data.

(1) Japanese query understanding

This interface morphologically, syntactically, and semantically analyzes Japanese query and converts it into a semantic structure with domain-specific knowledge. An SQL query is generated from this structure. Anaphora and ellipses included in a Japanese request can be resolved by three kinds of information: the structure adjacent to those representations, subject and noun phrases stored as candidates to resolve, and domain-specific knowledge.

(2) Multimodal input understanding

A hierarchical structured data (HSD) corresponding to multimedia data displayed on the screen can map the area indicated by a user to the logical objects that the user wants to access. An HSD consists of hierarchical objects and hierarchical relationships between those objects. Each object also has the space to memorize location information on the screen. For example, when displaying a page of a book, the HSD consists of the objects that indicate the whole book, pages, sections, sentences, phrases, and words. Each object memorizes its location on the display so that when the user points to an area of a page, the hierarchical objects can be determined by corresponding with the HSD.

A voice input including anaphora that expresses objects using this or it is analyzed and coverted to a semantic structure. According to both the structure adjacent to the anaphora and the HSD, the object that the user wants to point to can be determined and the anaphora can be resolved.

(3) Associated data retrieval using various relationships

The objects that a user wants to point to are determined by synergistically analyzing multimodal inputs. The relationship between an object and its associated data is inferred from the semantic structure. In this way, associated data can be retrieved through various relationships.

For example, Fig. 4 shows a scene of the associated data retrieval. First, we describe the interaction sequence prior to this situation. A user who want to know about Kakegawa City inputs a natural language query such as ``掛川について知りたい''(I want to know about Kakegawa). The system displays a list of titles obtained through analyzing the input. The user selects 掛川市(Kakegawa City). The system displays the book cover and index information. The user can read the content of the book on the screen.

At this time the user may want to read something new about the same subject. Using a finger and pointing to a title the user can say ``その中で最近の本は？''(Which books were recently published ?). The HSD for all objects displayed on screen is the domain surrounded by a dotted line on Fig. 5.

First, to determine the logical objects indicated by the user's finger, the system focuses on the domain from all HSD objects to HSD objects included within the shadowed zone on Fig. 5. Secondly, by analyzing the voice input, the system can determine that the target objects are the book titles in the current HSD. Since the meaning of the word recently is still not clear, the system asks the question,``Recently means since 1990, OK ?'' In this way, the user can specify a conceptual relationship such as recently published from the object that was indicated.

3.3. WWW based interface to handle other global linked relationships

This interface realizes access to relational databases such as ORACLE on the remote host by using the CGI(Common Gate Interface) of WWW architecture. It has five major features: (1) adopting WWW architecture; (2) scoring the result and retrieval through feedback of user's evaluations; (3) visualization as a graph with automatically selected axes; (4) understanding ambiguous retrieval input using thesauruses; (5) and useful retrieval methods.

(1) The WWW architecture has the following three features suitable for a digital library. (a) without installing another new browser, anyone can use the service with a WWW browser, which has become popular throughout the world in recent years. (b) The WWW server can distribute various kinds of multimedia data and the WWW browser can display it. (c) It is very easy to construct an original database service with an individual retrieval engine through CGI.

(2) Scoring the result and retrieval through feedback

This function handles the information strolling through similarity between documents and is based on relevant feedback (See Fig. 6). WAIS provides a simple and quick scoring and feedback retrieval method. But such simplicity often fails to retrieve proper data for requests of Japanese documents. In order to raise accuracy, the length of each text in which key words exit is used when calculating the propriety of each document for input key words. Whenever adding the text to the target relational database, the sentences or phrases included in individual columns are analyzed, filtered, and converted to an index word table. The index word table consists of 5 valus: word, record identification, table name, column name, and word wieght. The word weight (W) is calculated by the next equation: W = ( F * R * T ) / ( L * N ). (R, T: constants depending on the column including the word and thesaurus used, respectively. F: word frequency. L: the length of the text including the word. N: the number of records included in the column.) Because the significance of each term increases with text length (L) including the term become shorter, the term frequency in each text data increases, and the term frequency in the target database decreases. Using the index word table, the score of each record that satisfies the demand is calculated and indicated the records of scores that surpass the threshold are indicated. Likewise, using the table and results checked by the user, the feedback score is calculated and indicated by the algorithm described in [6].

(3) visualization as a graph with automatically selected axes:

This system can visualize the retrieved data as a graph with GIF icons of various length. In order to visualize data, a system hasa to select the axes that map the retrieved data to the each position of display. In this system, the user can freely and explicitly select those axes. If the user does not explicitly select the axes, the system can automatically select the axes by applying the rule suitable for the retrieval situation.

(4) Understanding ambiguous retrieval input using a thesaurus

Key words in a request are so varied that they are not always included in the target database at one time. To solve this problem, each conventional library prepares thesauruses that comprise a full set of key words included in the library. This interface uses a large common thesauru - Conceptual Dictionaries of EDR,[8,9]). It expands keywords, confirms with the user, and retrieves data under these broadened conditions.

(5) Useful retrieval methods

This interface also has three other useful retrieval methods: (a) Japanese request retrieval; (b) full-text search for text data; (c) and native language retrieval by translating both input keywords and retrieved results using a commercial machine translation system, PIVOT-JE/EJ[10], NEC.

4. Discussion

With a digital library in which users can access an immense amount of distributed, primary, and multimedia data through networks, what kind of retrieval methods and user interfaces will allow anyone to retrieve the most appropriate data? We discuss retrieval requirements methods and user interfaces from three points of view, that are features of digital libraries.

(1) Multimedia and primary data

For multimedia and primary databases, we have to develop display and retrieval methods suitable for individual media such as sketch retrieval for image data and hamming retrieval for music data. For structured texts, displaying and retrieving smaller segments of data is more appropriate. If displayed multimedia data often reminds the user of associated ideas, methods will be required that can easily specify the association or relationship. In such a realization, multimedia data structuring technology such as indexing, tagging, and filtering is necessary. We have realized an associated data retrieval interface with an HSD model. We will research the user interface to handle more varied relationships.

(2) Immense amount of searchable data

With the progress in network technology, reachable and searchable information resources are rapidly increasing. A small segment retrieval suitable for structured texts increases the number of units to be retrieved. We will have to search for target data from among a tremendous amount of data. The technology of data filtering and resource selection becomes important. We have implemented a new method of scoring and feedback in a WWW based interface and use it as a filtering method. We are also researching selection methods by using an agent architecture that collects features of an individual database.

(3) Various kinds of users

Using a digital library, various kinds of users will be able to access networks without the physical restrictions of conventional libraries. For novice users, familiar, simple, and intuitive interfaces have to be prepared; these may include menu selection, computer graphics, metaphors, and simulation of conventional libraries. For expert users, efficient, exact, sensitive, and quick response user interfaces are needed to handle more the varied relationships between displayed data and associated data. We have realized the virtual stacks interface for novice users and the associated data retrieval interface for expert users.

(4) Information visualization

When a user directly selects graphic objects displayed on a screen ,e.g., the virtual stacks interface, how to visualize tremendous amounts of data becomes a very difficult problem. The amount of information selected by subjects, keywords, and so on is mostly tremendous. The user may want to select the information through various aspects: time span, organization, subject, reference, and so on. We proposed one effective visualizaiton method using spatial information on a conventional library metaphor in virtual stacks and CG librarian. We also proposed a method to display the retrieved results as a graph through automatically selection of axes and parameters.

(5) Filtering

When a user wants to search several subjects, if there is information collected that the user's reliable friend has already gathered, the user may perhaps refer to that collection first. The collection is valued by the relationship between the user and the owner of that collection. In this way, a user can get reliable and useful information using a collection that reflects the other person's subjectivity or retrieval history. As accessible information space widens, social filtering methods are more important and useful. Virtual stacks and CG librarian interface can provide such a method through a virtual personal bookshelf.

5. Related work

Work on virtual open stacks by CG [11,12,13] has strongly inspired our virtual stacks and CG librarian interface. However, this work concentrates on how to visualize books as anchors on the virtual open shelves. As the number and capacity of accessible digital librarys increases, the space in which a user will have to walk around to reach a desired book is more widened. Thus, a system that guides the user to search to shelves closest to the desired one is necessary. The CG librarian system that we have developed can guide users through the virtual stacks and show a position within a tremendous informational space without bringing about disorientation.

During research and development of CG librarian, we classified body motion by emblems that [14] proposed. [15] also concentrated on the relationship between dialogue content and body motion. This work focuses on how to illustrate direction and position and represent shape by hand gestures in a dialogue: the roles of motion are emblem and illustration. Our CG librarian focuses on emphasizing verbal language by body motion, relation between dialogue situation and respective motion pattern, and the artistic aspect of motion.

Based on associated data retrieval, we proposed a method of resolving anaphora with multi-modal context and HSD models. [21] proposed one method that made hierarchical structured data from image data through a iterative cycle of separating images, hierarchically organizing their domains, and naming the domain components of images by human recognition ability. The data construction method is very useful. Although our HSD model is not limited to only image objects, we will perharps use closely related methods.

Based on WWW based interface, work on relevance feedback interface [22,23,24] has been proposed where the effective relevance feedback algorithm is based on a vector model. We extended the method to calculate the weight of each keyword using value length parameters and column type paramenters.``Information Visualizer'' [17,18] proposed various visualization methods of hierarchical structure information, reference information, and complex information. Our WWW based interface proposed a method to select axes automatically through visualizing on a graph. [19] and [20] proposed a method where persons that have close preferences exchange information; it is based on filling out a questionnaire. In our virtual stacks, a user can freely arrange the books that the user selected on virtual personal bookshelf. Another user that has close preferences can get the desired documents using the user's virtual personal bookshelf. In this way, we can implement one of the social filtering methods.

6. Conclusion

We described three systems to realize information strolling for the user interfaces of a digital library: (1) virtual stacks and CG librarian interface, which displays a 3D-space environment that simulates a conventional library and creates a sensation of walking around in the virtual stacks with voice guidance by CG librarian; (2) associated data retrieval system featured multimodal understanding using an HSD model; (3) and WWW based interface featured a high-precision relevance feedback method and graphical visualization of retrieved data with automatic selection of axes. We think these are effective methods for handling various levels of users. From now on, we will not only do research on synthesizing these methods, but also on data processing methods to automatically construct multimedia data with global links and hierarchical structures. Future work will also include circulation systems with encryption technology.

Acknowledgements

We would like to acknowledge the continued support, advice and direction provided by Masano Managaki and Hitoshi Miyai. We would also like to give a special thanks to our group members that assisted in developing our system and gave many hints: Lu Shan, Masaki Hara, Kanako Kubo, Kazuo Ishida, Itaru Hosomi, and Haruko Shiba.

[1] E. A. Fox, R. M. Akscyn, R. K. Furuta, and J. J. Leggett.
   Digital Libraries. 
   Communications of the ACM,Vol. 38, No. 4, (April 1995), 
   pp. 23-28.

[2] R. Rao, J. O. Pederson, et al.
    Rich Interaction in the Digital Library.
    Communications of the ACM 38, 4 (April 1995),
    pp. 29-39.
[3] S. Gauch, R. Aust, et al.
    The Digital Video Library System: Vision and Design. 
    Proceedings of the 1st Annual Conference on the Theory and Practice of Digital Libraries: Digital Libraries '94,
    (Texas A&M Univ., College Station, TX, June 1994).

[4]  T. Kamiya, S. Lu, M. Hara, and H. Miyai. 
    Development of Electronic Library Interface with 3D Walk-through and CG Librarian (in Japanese).
    IPSJ SIG Notes Vol. 95, No. 1, pp. 27-34, 1995.

[5] S. Lu, S. Yoshizaka, et al.
    A Human Computer Dialogue Agent with Body Gesture, Hand Motion, and Speech.
    HCI International '95,
    Tokyo, 1995. (in printing)

[6] S. E. Robertson, K. Sparck Jones. 
    Relevance Weighting of Search Terms. 
    Journal of the American Society for Information Science,
    May-June (1976).

[7] M. Tani, H. Shiba, and S. Ichiyama.
    Voice Operation for Multi Window Systems (in Japanese). 
    IPSJ SIG Notes,
    Vol. 94, No. 98, pp. 143-150,1994.

[8] A. Koizumi, M. Arioka, et al. 
    Noun Phrasal Entries in the EDR English Word Dictionary. 
    Proceedings of COLING '94, August 1994.

[9] Japan Electronic Dictionary Research Institute, Ltd.
    Electronic Dictionary Technical Guide.
    TR-042. Ch. 5. 1994.

[10] M. Miura, M. Hirata, and N. Hoshino. 
    Learning Mechanism in Machine Translation System ``PIVOT''.
    Proceedings of COLING '92,
    August 1992.

[11] M. Sato.
     Electronic Libraries ``SongWoKung'' which support retrievals of books using presentation of CG pictures of libraries (in Japanese).
     IPSJ SIG Notes, Fi-24-2, Nov. 1991.

[12]	M. Sato.
	Open-shelf Inspection Service on Telematic Libraries.
	Fourth IWT, May 1988, Caen France.

[13]	M. Sato.
	Visual Environment for Telematic Library Users.
	Fifth IWT, Sept. 1989, Denver, CO.

[14]	P. Ekman and Friesen.
	``Three classes of nonverbal behavior, Aspects of Nonverbal Communication.''
	Swets and Zeitlinger, 1980.

[15]	J. Cassell, C. Pelachaud, et al.
	``ANIMATED CONVERSATION: Rule-based Generation of Facial Expression, Gesture & Spoken Intonation for Multiple Conversational Agents.''
	Proc. of SIGGRAPH '94, pp. 413-420, 1994.

[16]	R. Rao, J. O. Pedersen, M. A. Hearst, J. D. Mackinlay, S. K. Card, L. Masinter,P. K. Halvorsen, and G. G. Robertson.
	``Rich Interaction in the Digital Library'',
	Communications of the ACM, vol. 38, No. 4, pp. 29-39, Apr. 1995.

[17]	J.D. Mackinlay, R. Rao, and S. K. Card,
	``An Organic User Interface for Searching Citation Links''.
	CHI '95 Conference Proceedings, pp. 202-209, May 1995.

[18]	J. Lamping, R. Rao, and P. Pirolli.
	``A Focus+Context Technique Based on Hyperbolic Geometry for Visualizing Large Hierarchies''.
	CHI '95 Conference Proceedings, pp. 67-73, May 1995.

[19]	U. Shardanand, and P. Maes.
	``Social Information Filtering: Algorithms for Automating ``Word of Mouth.''
	CHI '95 Conference Proceedings, pp. 210-217, May 1995.

[20]	W. Hill, L. Stead, M. Rosenstein, and G. Furnas.
	``Recommending and Evaluating Choices in a Virtual Community of Use''.
	CHI'95 Conference Proceedings, pp. 194-201, May 1995.

[21]	H. S. Shakir.
	A Universal Image Database System for Pictorial Information and Meta-Information Management, PhD Thesis, Kyoto University, Nov. 1994.

[22]	E. A. Fox, D. Hix, L. Nowell, D. Brueni, W. Wake, L. Heath, and D. Rao.
	``Users, user interfaces, and objects: Envision, a digital library.''
	J. Amer. Soc. Info. Sci. Vo. 44, No. 8, Sept. 1993, pp. 480-491.

[23]	E. A. Fox, R. K. France, E. Sahle, A. Daoud, and B. E. Cline.
	``Development of a modern OPAC:From REVTOLC to MARIAN.''
	In Proceedings of the 16th Annual International ACM SIGIR Conference on RD in Information Retrieval., (AMC, New York, 1993) pp. 248-259.

[24]	G. Salton, E. A. Fox, and H. Wu.
	``Extended Boolean information retrieval.''
	Communications of the ACM, 26(11):1022-1036, Nov. 1983.

Aug 22 1996 - mik@cc.gatech.edu