CS 8803B Artificial Intelligence

Fall 2002

Project 2

Semantic Networks and Spreading Activation

Due Monday, October 7, 2002, 3:00 PM

Introduction

In this project you will be building a program which reads in a sequence of facts and queries and produces a sequence of answers to the queries based on the facts provided. The facts and queries will be about the composition of simple physical devices (e.g., a flashlight). You are required to store and organize facts in the form of a semantic network, i.e., parts of the device will be represented as nodes in a network and relationships among those parts will be represented as links among those nodes. Some of the queries will require that you retrieve a specific item in memory. If a variety of items are available, you will need to choose the one which is most relevant to those matters which are currently under discussion. To accomplish this, you will need to use spreading activation to identify the relevant nodes.

For this project, you are welcome to use any modern language on a machine of your choice, as long as its possible for the TA to easily run the program for validation. Common Lisp and Scheme are particularly well suited to this project, especially since the input information uses a syntax which is particularly convenient for processing by these languages. It is possible to do this project in C, C++, or Java, but you must write the input routines yourself, which could add as much as 10 to 20 hours of work to the assignment. For that reason, Common Lisp and Scheme are highly recommended. However, you are free to choose your language. Keep in mind that the TA, although an accomplished computer scientist, is not familiar with every language and every platform, and thus may not be able to assist you as much on certain platforms. Do check with the TA if you plan to use an unusual language or operating system. Also note that your program MUST COMPILE AND RUN ON A CoC SYSTEM (since only CoC systems are available to the TA for grading). If you do not have a CoC account and would like one for this class you can request one from CNS by printing out and filling out the account request form and following the instructions (the instructions say how to turn it in).

The class web site contains links to several supplemental files for this project. In addition to this document,there are files called stool.in, stool.out, flashlight.in, and flashlight.out. The two .in files are example inputs to the program. The two .out files demonstrate what the proper outputs for this program should look like. If you run the program on one of the .in files, it should produce output which matches the corresponding .out file. The stool example is described in detail in the following section. The flashlight example is somewhat larger and demonstrates the desired behavior more thoroughly.

Example

Note that this example and the requirements below describe the input assuming you are not using Lisp or Scheme. Using these languages will require changing the input slightly (in particlar, all symbols must be quoted).

Your program should read in a file in which each line contains either a fact or a query. It should output (to the screen) one line for each query in the input file; each line of output should contain an answer to the associated query. The simple example below illustrates the desired behavior of the program (the requirements are explained more generally and precisely in the next section). The following is an example input file which provides facts and queries relating to a simple three-legged stool.

(FACT (IS-A SEAT1 SEAT))
(FACT (IS-A L1 LEG))
(FACT (IS-A L2 LEG))
(FACT (IS-A L3 LEG))
(FACT (CONNECTED L1 SEAT1))
(FACT (CONNECTED L2 SEAT1))
(FACT (CONNECTED L3 SEAT1))
(QUERY (CONNECTED L1 L2))
(QUERY (CONNECTED L1 SEAT1))
(QUERY VALUE (IS-A VALUE SEAT))
(QUERY VALUE (AND (IS-A VALUE LEG)(CONNECTED VALUE L1)(CONNECTED VALUE SEAT1)))
(QUERY VALUE (AND (IS-A VALUE LEG)(CONNECTED VALUE SEAT1)))
Given this file as input, the program should produce the following output:
No
Yes
SEAT1
None
L1
The first seven lines of the input file contain facts about the stool. The stool consists of a seat named SEAT1 and three legs named L1, L2, and L3. The remaining five lines of the input file contain queries about the stool. The five lines of output are answers to these five queries in order. The first query asks if L1 is connected to L2. Since there is nothing in the facts given that states that they are connected, the line No is printed to the screen. The next query asks if L1 is connected to SEAT1; it is, so Yes is printed. The third query asks for a value; in particular it asks for a value which is a seat. Since SEAT1 is the only seat, SEAT1 is printed. The next query is a little more complicated. It asks for a value that is a leg and is connected to both L1 and SEAT1. Since no such value exists, None is printed. The last query is the most complicated one of all to answer. It asks for a value which is a leg and is connected to SEAT1. Since three such values exist, the program needs to choose one. In general, you want to return the value which is most closely related to the topics which have been recently addressed in the input. Specifically here, two of the last three queries specifically asked about L1 and none of the other legs have been asked about or provided as answers for several lines. Thus L1 is the most relevant node which correctly satisfies the query and should be chosen here.

To determine the most relevant node, you should use spreading activation. Each node that you have in memory should have an activation level, i.e., a number which indicates how relevant it is to the current context. Each time a fact or a query refers to an node in memory, you should increase the activation level of the node. Also, when the activation level of one node is increased, the activation levels of other nodes that relate to it should also increase (but not by as much). Furthermore, the activation levels of nodes in memory should decay over time; if a node is referred to many times in a row, it should be highly activated for queries that occur immediately after those references but should be less highly activated if it is accessed later on. In the example, L1 is obviously the most activated node in memory which satisfies the query since it is the only one which has been mentioned in recent queries. The other two legs should also be somewhat activated. In particular, since SEAT1 was mentioned in the previous query, it should have received some activation at that time and some of this activation should have spread to those nodes which are connected to it (including L2 and L3). Furthermore, these legs may also retain some activation from the previous times in which they were explicitly referred to in earlier facts and queries. However, the combination of these smaller activation levels should definitely not be enough to outweigh the relatively high activation that L1 has from its recent explicit references.

Note that this description of how spreading activation works is deliberately vague. Many questions are left open. How quickly do activation levels decay? How much activation should spread from a node to a related node? If activation spreads to an node, should that additional activation spread to other nodes adjacent to it? If so, how much, and when should it stop? You need to devise your own answers to questions of this sort.

Requirements

Your program should be able to process the following sorts of facts:
(FACT (IS-A node type))
(FACT (CONNECTED node node))
You may assume that any node which is referred to in a CONNECTED fact has first been defined in a IS-A fact. You may also assume that there are no nodes which have more than one type (however, there may be nodes which are connected to more than one other node).

Your program should be able to process the following sorts of queries:

(QUERY (IS-A node type))
(QUERY (CONNECTED node node))
(QUERY VALUE (IS-A VALUE type))
(QUERY VALUE (CONNECTED node VALUE))
(QUERY VALUE (CONNECTED VALUE node))
(QUERY VALUE (AND ...))
The remainder of a query with AND should be a sequence of two or more terms which have the same form as the individual value queries. Note that you should treat the CONNECTED relation as symmetric; i.e., if A is connected to B then B is connected to A. The flashlight example provided on the class web site demonstrates these desired features of your project. You should read that example over carefully and make sure you understand why the program gives the answers that it does for each query before you begin programming. If it simplifies matters for you, you may assume that all inputs appear in all capital letters and that all facts appear before any queries do. You may assume that all symbols (and node names in particular) are assembled entirely from characters, numbers, and the dash character ("-"). You may also assume that there are no extraneous spaces or blank lines. You will not be graded on the basis of obscure "trick" problems such as ways to abuse the syntax and/or provide meaningless input information. As long as your program is able to handle inputs of the form that the test files provided, it should be fine. As discussed above, the answers to "value" queries should be determined by spreading activation. That is, the answer should be the node (if any) with the highest activation, according to these rules: Note that there is an issue of what to set the activation level at if, at some time step, two nodes can spread their activation to the same node. You must decide what to do, here. A common decision is, when spreading the activation from one node to the next, to always set the activation to the larger of the current activation level and the computed activation level due to spreading activation. There are other possible approaches.

Deliverables

Your submission should include the following items: You should turn your project write-up in at the beginning of class on Monday, September 30, 2002. The source code and additional test files should be turned in to the TA by beginning of class (3:00PM) that same day by e-mail. The e-mail address to use is yaner@cc.gatech.edu. Do not e-mail the code to one of the professors of the class. It is recommended that you combine the files for this project into one file using tar or zip and submit this file as an e-mail attachment using pine or a similar modern e-mail program. If you have any trouble doing this, just put each file that you are submitting by e-mail directly into the main text of your e-mail with clear separations to indicate where one file begins and the next one ends.

Grading

This assignment will be graded out of 100 points. These points will be divided up as follows: