CS 8803B Artificial Intelligence
Fall 2002
Project 2
Semantic Networks and Spreading Activation
Due Monday, October 7, 2002, 3:00 PM
Introduction
In this project you will be building a program which reads in a sequence
of facts and queries and produces a sequence of answers to the queries
based on the facts provided. The facts and queries will be about
the composition of simple physical devices (e.g., a flashlight).
You are required to store and organize facts in the form of a semantic
network, i.e., parts of the device will be represented as nodes in a
network and relationships among those parts will be represented as links
among those nodes. Some of the queries will require that you retrieve a
specific item in memory. If a variety of items are available, you will
need to choose the one which is most relevant to those matters which
are currently under discussion. To accomplish this, you will need to
use spreading activation to identify the relevant nodes.
For this project, you are welcome to use any modern language on a machine
of your choice, as long as its possible for the TA to easily run the program
for validation. Common Lisp and Scheme are particularly well suited
to this project, especially since the input information uses a syntax which
is particularly convenient for processing by these languages. It is possible
to do this project in C, C++, or Java, but you must write the input
routines yourself, which could add as much as 10 to 20 hours of work
to the assignment. For that reason, Common Lisp and Scheme are highly
recommended. However, you are free to choose your language. Keep in mind
that the TA, although an accomplished computer scientist, is not familiar
with every language and every platform, and thus may not be able to
assist you as much on certain platforms. Do check with the TA if you
plan to use an unusual language or operating system. Also note that
your program MUST COMPILE AND RUN ON A CoC SYSTEM (since only
CoC systems are available to the TA for grading). If you do not have
a CoC account and would like one for this class you can request one
from CNS by printing out and filling out the
account
request form and following the instructions (the instructions
say how to turn it in).
The class web site contains links to several supplemental files for this
project. In addition to this document,there are files called
stool.in, stool.out, flashlight.in, and
flashlight.out. The two .in files are example inputs
to the program. The two .out files demonstrate what the proper
outputs for this program should look like. If you run the program on one
of the .in files, it should produce output which matches the
corresponding .out file. The stool example is described
in detail in the following section. The flashlight example is somewhat
larger and demonstrates the desired behavior more thoroughly.
Example
Note that this example and the requirements below describe the input
assuming you are not using Lisp or Scheme. Using these languages
will require changing the input slightly (in particlar, all symbols must
be quoted).
Your program should read in a file in which each line contains either a
fact or a query. It should output (to the screen) one line for each
query in the input file; each line of output should contain an answer to
the associated query. The simple example below illustrates the desired
behavior of the program (the requirements are explained more generally
and precisely in the next section). The following is an example input
file which provides facts and queries relating to a simple three-legged
stool.
- (FACT (IS-A SEAT1 SEAT))
- (FACT (IS-A L1 LEG))
- (FACT (IS-A L2 LEG))
- (FACT (IS-A L3 LEG))
- (FACT (CONNECTED L1 SEAT1))
- (FACT (CONNECTED L2 SEAT1))
- (FACT (CONNECTED L3 SEAT1))
- (QUERY (CONNECTED L1 L2))
- (QUERY (CONNECTED L1 SEAT1))
- (QUERY VALUE (IS-A VALUE SEAT))
- (QUERY VALUE (AND (IS-A VALUE LEG)(CONNECTED VALUE L1)(CONNECTED
VALUE SEAT1)))
- (QUERY VALUE (AND (IS-A VALUE LEG)(CONNECTED VALUE SEAT1)))
Given this file as input, the program should produce the following output:
- No
- Yes
- SEAT1
- None
- L1
The first seven lines of the input file contain facts about the stool.
The stool consists of a seat named SEAT1 and three legs named
L1, L2, and L3. The remaining five lines of
the input file contain queries about the stool. The five lines of output
are answers to these five queries in order. The first query asks if
L1 is connected to L2. Since there is nothing in the
facts given that states that they are connected, the line No
is printed to the screen. The next query asks if L1 is connected
to SEAT1; it is, so Yes is printed. The third query asks
for a value; in particular it asks for a value which is a seat. Since
SEAT1 is the only seat, SEAT1 is printed. The next
query is a little more complicated. It asks for a value that is a leg
and is connected to both L1 and SEAT1. Since no such
value exists, None is printed. The last query is the most
complicated one of all to answer. It asks for a value which is a leg and
is connected to SEAT1. Since three such values exist, the program
needs to choose one. In general, you want to return the value which is
most closely related to the topics which have been recently addressed
in the input. Specifically here, two of the last three queries
specifically asked about L1 and none of the other legs have been
asked about or provided as answers for several lines. Thus L1 is
the most relevant node which correctly satisfies the query and
should be chosen here.
To determine the most relevant node, you should use spreading activation.
Each node that you have in memory should have an activation level, i.e.,
a number which indicates how relevant it is to the current context.
Each time a fact or a query refers to an node in memory, you should increase
the activation level of the node. Also, when the activation level
of one node is increased, the activation levels of other nodes that relate
to it should also increase (but not by as much). Furthermore, the
activation levels of nodes in memory should decay over time; if a node
is referred to many times in a row, it should be highly activated for queries
that occur immediately after those references but should be less highly
activated if it is accessed later on. In the example, L1 is obviously
the most activated node in memory which satisfies the query since it is
the only one which has been mentioned in recent queries. The other
two legs should also be somewhat activated. In particular, since
SEAT1 was mentioned in the previous query, it should have
received some activation at that time and some of this activation
should have spread to those nodes which are connected to it
(including L2 and L3). Furthermore, these legs may
also retain some activation from the previous times in which they
were explicitly referred to in earlier facts and queries. However,
the combination of these smaller activation levels should definitely not
be enough to outweigh the relatively high activation that L1 has
from its recent explicit references.
Note that this description of how spreading activation works is deliberately
vague. Many questions are left open. How quickly do activation
levels decay? How much activation should spread from a node to a
related node? If activation spreads to an node, should that additional
activation spread to other nodes adjacent to it? If so, how much,
and when should it stop? You need to devise your own answers to questions
of this sort.
Requirements
Your program should be able to process the following sorts of facts:
- (FACT (IS-A node type))
- (FACT (CONNECTED node node))
You may assume that any node which is referred to in a CONNECTED
fact has first been defined in a IS-A fact. You may also assume
that there are no nodes which have more than one type (however, there
may be nodes which are connected to more than one other node).
Your program should be able to process the following sorts of queries:
- (QUERY (IS-A node type))
- (QUERY (CONNECTED node node))
- (QUERY VALUE (IS-A VALUE type))
- (QUERY VALUE (CONNECTED node VALUE))
- (QUERY VALUE (CONNECTED VALUE node))
- (QUERY VALUE (AND ...))
The remainder of a query with AND should be a sequence of two or more
terms which have the same form as the individual value queries. Note
that you should treat the CONNECTED relation as symmetric; i.e.,
if A is connected to B then B is connected to
A. The flashlight example provided on the class web site
demonstrates these desired features of your project. You should read
that example over carefully and make sure you understand why the program
gives the answers that it does for each query before you begin programming.
If it simplifies matters for you, you may assume that all inputs appear in
all capital letters and that all facts appear before any queries do. You
may assume that all symbols (and node names in particular) are assembled
entirely from characters, numbers, and the dash character ("-"). You may
also assume that there are no extraneous spaces or blank lines.
You will not be graded on the basis of obscure "trick" problems such as
ways to abuse the syntax and/or provide meaningless input information.
As long as your program is able to handle inputs of the form that the
test files provided, it should be fine.
As discussed above, the answers to "value" queries should be determined
by spreading activation. That is, the answer should be the node (if any)
with the highest activation, according to these rules:
- Activation level is a single number
- Any fact or query that refers to a node causes its activation
level to be raised
- At each time step (meaning each line of input), activation
spreads from node to node
- The activation of a node due to spreading activation should be
less than that due to initial reference
- The spreading of activation should cut off at some point (e.g.
the activation to which a node would be set is less than some
threshold, or else the spreading has reached a maximum radius
from the original source of activation)
- In each time step, the activation of all nodes decays by
some amount
Note that there is an issue of what to set the activation level at if,
at some time step, two nodes can spread their activation to the same
node. You must decide what to do, here. A common decision is, when
spreading the activation from one node to the next, to always set the
activation to the larger of the current activation level and the computed
activation level due to spreading activation. There are other possible
approaches.
Deliverables
Your submission should include the following items:
- A brief (2-3 page) project write-up. This should focus on describing
the key technical and theoretical issues that you encountered while
working on the project. Of particular interest is the issue of
spreading activation. You must describe the rules your program
uses for spreading activation, and in particular you must address
these issues:
- How does activation spread? Also, when a node's activation may
be raised by more than one neighbor, how do you determine
the activation level?
- What determines when activation stops spreading?
- How does activation decay? How is the decay computed?
- What are the advantages and disadvantages of your scheme?
Your write-up should also describe how to compile and run your
program in CoC environments.
- The source code for your program.
- At least two additional test files which demonstrate important
characteristics of your program. These test files should involve
objects/devices other than the ones presented in the examples
provided, and should illustrate how your spreading activation
mechanisms provide relevant answers to queries with more than one
possible result.
You should turn your project write-up in at the beginning of class on
Monday, September 30, 2002. The source code and additional test files
should be turned in to the TA by beginning of class (3:00PM) that same day
by e-mail. The e-mail address to use is
yaner@cc.gatech.edu. Do not e-mail
the code to one of the professors of the class. It is recommended that
you combine the files for this project into one file using tar or zip
and submit this file as an e-mail attachment using pine or a similar
modern e-mail program. If you have any trouble doing this, just put
each file that you are submitting by e-mail directly into the main
text of your e-mail with clear separations to indicate where one file
begins and the next one ends.
Grading
This assignment will be graded out of 100 points. These points will
be divided up as follows:
- Project write-up: 20 points
- Extra test files:10 points
- Program style (comments, organization, clarity, etc.): 10 points
- Program quality (efficiency, generality, scalability, etc.):
10 points
- Program correctness: 50 points