CS 6660 Intelligent Agents

Fall 2001

 

Project 2

Semantic Networks and Spreading Activation

Due 10/29/01 3:00pm


Introduction

In this project you will be building a program which reads in a sequence of facts and queries and produces a sequence of answers to the queries based on the facts provided.  The facts and queries will be about the composition of simple physical devices (e.g., a flashlight).  You are required to store and organize facts in the form of a semantic network, i.e., parts of the
device will be represented as nodes in a network and relationships among those parts will be represented as links among those nodes.  Some of the queries will require that you retrieve a specific item in memory.  If a variety of items are available, you will need to choose the one which is most relevant to those matters which are currently under discussion.  To accomplish this, you will need to use spreading activation to identify the relevant nodes.

For this project, you are welcome to use any modern language on a machine of your choice, as long as its possible for the TA to easily run the program for validation.  Common Lisp and Scheme are particularly well suited to this project, especially since the input information uses a syntax which is particularly convenient for processing by these languages.  However, C, C++, Java, Tcl, are also fine languages to use for this project, and reading in the input information should be fairly simple in these languages, too.  The only restriction is that you must turnin a program that runs on one of the platforms available at the College of Computing and/or on Acme such as Solaris, Linux or Windows.  Because the TA is particularly Windows inclined, you may submit Windows 95/98/ME programs under development environments not necessarily at the CoC as long as you clear it with him ahead of time and provide instructions for running.  Keep in mind that the TA, although an accomplished computer scientist, is not familiar with every language and every platform, and thus may not be able to assist you as much on certain platforms.  Do check with the TA if you plan to use an unusual language or operating system.

The class website contains links to several supplemental files for this project.  In addition to this document,there are files called stool.in, stool.out, flashlight.in, and flashlight.out .  The two .in files are example inputs to the program.  The two .out files demonstrate what the proper outputs for this program should look like.  If you run the program on one of the .in files, it should produce output which matches the corresponding .out file.  The stool example is described in detail in the following section.  The flashlight example is somewhat larger and demonstrates the desired behavior more thoroughly.
 

Example

Your program should read in a file in which each line contains either a fact or a query.  It should output (to the screen) one line for each query in the input file; each line of output should contain an answer to the associated query.  The simple example below illustrates the desired behavior of the program (the requirements are explained more generally and precisely in the next section).  The following is an example input file which provides facts and queries relating to a simple three-legged stool.
(FACT (IS-A SEAT1 SEAT))
(FACT (IS-A L1 LEG))
(FACT (IS-A L2 LEG))
(FACT (IS-A L3 LEG))
(FACT (CONNECTED L1 SEAT1))
(FACT (CONNECTED L2 SEAT1))
(FACT (CONNECTED L3 SEAT1))
(QUERY (CONNECTED L1 L2))
(QUERY (CONNECTED L1 SEAT1))
(QUERY VALUE (IS-A VALUE SEAT))
(QUERY VALUE (AND (IS-A VALUE LEG)(CONNECTED VALUE L1)(CONNECTED VALUE SEAT1)))
(QUERY VALUE (AND (IS-A VALUE LEG)(CONNECTED VALUE SEAT1)))
Given this file as input, the program should produce the following output:

No
Yes
SEAT1
None
L1

The first seven lines of the input file contain facts about the stool.  The stool consists of a seat named SEAT1 and three legs named L1, L2, and L3 .The remaining five lines of the input file contain queries about the stool.  The five l nes of output are answers to these five queries in order.  The first query asks if L1 is connected to L2.  Since there is nothing in the facts given that states that they are connected, the line No is printed to the screen.  The next query asks if L1 is connected to SEAT1; it is, so Yes is printed.  The third query asks for a value; in particular it asks for a value which is a seat. Since SEAT1 is the only seat, SEAT1 is printed.  The next query is a little more complicated.  It asks for a value that is a leg and is connected to both L1 and SEAT1.  Since no such value exists, None is printed.  The last query is the most complicated one of all to answer.  It asks for a value which is a leg and is connected to SEAT1.  Since three such values exist, the program needs to choose one.  In general, you want to return the value which is most closely related to the topics which have been recently addressed in the input.  Specifically here, two of the last three queries specifically asked about L1 and none of the other legs have been asked about or provided as answers for several lines.  Thus L1 is the most relevant node which correctly satisfies the query and should be chosen here.

To determine the most relevant node, you should use spreading activation.  Each node that you have in memory should have an activation level, i.e., a number which indicates how relevant it is to the current context.  Each time a fact or a query refers to an node in memory, you should increase the activation level of the node.  Also, when the activation level of one node is increased, the activation levels of other nodes that relate to it should also increase (but not by as much).  Furthermore, the activation levels of nodes in memory should decay over time; if a node is referred to many times in a row, it should be highly activated for queries that occur immediately after those references but should be less highly activated if it is accessed later on.  In the example, L1 is obviously the most activated node in memory which satisfies the query since it is the only one which has been mentioned in recent queroes.  The other two legs should also be somewhat activated. In particular, since SEAT1 was mentioned in the previous query, it should have received some activation at that time and some of this activation should have spread to those nodes which are connected to it (including L2 and L3).  Furthermore, these legs may also retain some activation from the previous times in which they were explicitly referred to in earlier facts and queries.  However, the combination of these smaller activation levels should definitely not be enough to outweigh the relatively high activation that L1 has from its recent explicit references.

Note that this description of how spreading activation works is deliberately vague.  Many questions are left open.  How quickly do activation levels decay?  How much activation should spread from a node to a related node?  If activation spreads to an node, should that additional activation spread to other nodes adjacent to it?  If so, how much, and when should it stop?  You need to devise your own answers to questions of this sort.
 

Requirements

Your program should be able to process the following sorts of facts:

(FACT (IS-A node type))
(FACT (CONNECTED node node))

You may assume that any node which is referred to in a CONNECTED fact has first been defined in a IS-A fact. You may also assume that there are no nodes which have more than one type (however, there may be nodes which are connected to more than one other node).

Your program should be able to process the following sorts of queries:

(QUERY (IS-A node type))
(QUERY (CONNECTED node node))
(QUERY VALUE (IS-A node VALUE))
(QUERY VALUE (CONNECTED node VALUE))
(QUERY VALUE (CONNECTED VALUE node))
(QUERY VALUE (AND ...))

The remainder of a query with AND should be a sequence of two or more terms which have the same form as the individual value queries.  Note that you should treat the CONNECTED relation as symmetric; i.e., if A is connected to B then B is connected to A.  The flashlight example provided on the class website demonstrates these desired features of your project.
You should read that example over carefully and make sure you understand why the program gives the answers that it does for each query before you begin programming.  If it simplies matters for you, you may assume that all inputs appear in all capital letters and that all facts appear before any queries do.  You may also assume that there are no extraneous spaces or blank lines.  You will not be graded on the basis of obscure “trick ”problems such as ways to abuse the syntax and/or provide meaningless input information.  As long as your program is able to handle inputs of the form that the test files provided, it should be fine.
 

Deliverables

Your submission should include the following tems: You should turn your project write-up in at the begining of class on 10/29/01.  The source code and additional test files should be turned in to the TA by begining of class (3:00pm) on 10/29/01 by email.  The email address to use is “megak@cc.gatech.edu ”.  Do not email the code to the professor of the class.  It is recommended that you combine the
files for this project into one file using tar or zip and submit this file as an email attachment using pine or a similar modern email program.  If you have any trouble doing this, just put each file that you are submitting by email directly into the main text of your email with clear separations to indicate where one file begins and the next one ends.
 

Grading

This assignment will be graded out of 100 points.  These points will be divided up as follows: