HIPED


Heterogeneous Intelligent Processing for Engineering Design

Future progress in intelligent systems (e.g. engineering design, planning, etc) requires improvement in Knowledge Base / Data Base (KB/DB) access. We believe two of the most important issues with respect to KB/DB access are scalability (how much information is available to the intelligent system) and usability (how easily can the intelligent system access ALL the relevant information). To explore usability in interfacing intelligent systems to large-scale information sources, we have experimented with providing knowledge from an external information to an interactive knowledge based design environment known as Interactive Kritik by using IDI as the interface to an Oracle database.

Two approaches to scalability of available knowledge are construction of a large, monolithic system containing all the information and integrating access to a distributed set of information sources. We choose the latter because it lends itself to truly large-scale integration of information and it more easily facilitates incremental (collecting the knowledge over time rather that all at once) integration of information. To this end, we have developed an integration paradigm which allows the administrator of an information source to describe their data in the form of an Augmented Export Schema (AES). The AES can then be used to expand the current global schema so that the new information is accessible. We have developed a package called JOIN which takes an AES and the current global schema and derives a new global schema representing all the information available to global users.


Current Research

We are trying to address various aspects of query processing in distributed heterogeneous environments with special emphasis on incorporating knowledge at different levels. The knowledge relates to information about the sources of data, their structure, their content, and their overall relevance to the problem at hand.

Our present approach to the issue of incorporating knowledge into query processing and formulation can be broken down in three areas:

  1. Query Formulation
  2. Semantic and multiple Query Optimization
  3. Incorporation of learning into the mediation task
We discuss each area briefly below.

Query Formulation

In the query formulation area, we are investigating issues involved in integrating multiple sources of semi-structured data like text documents. We are studying user interface and visualization techniques to let the user discover the ways in which data is organized. This allows the user to determine how meaningful the underlying information sources are. and to discover the potentially useful ones. Some of the methods being used are:

  1. Use of thesaurus during the query formulation process to prompt the user with additional words related to the query.
  2. Techniques to visualize the query results and compare them with query words.
  3. Feedback from the user at different levels of granularity (like clusters of documents, individual documents, parts of a document, phrases and words.
The preliminary implementation shows that the performance of Information Retrieval systems can be improved by providing the right set of interaction techniques and visualization schemes.

Semantic and Multiple Query Optimization

Query optimization is a decision process that selects the best query evaluation strategy from a set of execution plans. The performance of this process can be improved by providing better information about the contents of the database (i.e., meta-data). Furthermore, the process itself can be improved by incorporating the semantics of the database and by considering global plans which optimize execution over a set of queries.

A Meta-Data View Graph (MVG) is a network for organizing and managing information about a database. The nodes of the network represent logical views of the database and contain information specific to the corresponding data set. Statistical information (e.g., selectivity factors) is used by the query optimizer to generate more accurate estimates of execution cost. Semantic information (e.g., integrity constraints) is used to transform a query into a set of semantically equivalent queries giving the selection process more plans to choose from. Finally, when given a set of queries, the MVG network can identify common subexpressions, the results of which can be computed once and shared among the set of queries.

Incorporation of learning into heterogeneous database mediation

Large-scale integrated knowledge systems can be, and often are, opaque to their users. But if the knowledge organization and information processing in these systems is not transparent, then the user may not be comfortable in using the system or be confident of the results it produces. Three issues are considered in designing transparent knowledge systems: how to explain and illustrate the system's reasoning to a user, how to explain and justify its results, and how to enable the user to explore and navigate its knowledge base.

In particular, endowing the knowledge systems with explicit models of their own reasoning process may provide useful answers to these questions. The Interactive Kritik system is developed from earlier systems in the Kritik series of autonomous device design programs. It presents to the user a model of its reasoning based on a hierarchy of tasks and methods which refer to as a Task-Method-Knowledge (TMK) model.

The plan is to use the above systems together with real large scale databases to enable designers to access information of a heterogeneous nature intelligently for solving their design problems.

We need to expand our current set of integration tools to accomplish large- scale integration. Future work includes:

  • Construction of Augmented Export Schema development tools to allow information source administrators to completely/accurately represent their data.
  • Extend the reasoning capabilities of JOIN by allowing more types of inferencing on schema correspondences and consideration of user feedback.
  • Development of query processing facilities which support users searching a very large information space. Such facilities include explanation of query results, suggestion of future query directions, and visualization of information relevancy.

Publications:


Database Systems Database Group Projects Georgia Tech College of Computing Help
Webslinger: Wai Gen Yee
Last modified: Fri Sep 25 09:44:11 PDT 1998