Interface Agent Markup Language 

Specification for Interface Agent Framework

by Jun Xiao

Content

1. Design Criteria

The overall goal of Interface Agent Markup Language is to set users/designers free from constraints of various low level animation agents toolkits  and to hide specific details about those low level platform from them. Following are the key design criteria:

2. Overview of the System 

A rendering system that supports Interface Agent Markup Language will use linguistic and emotional information contained in the markup text (XML data structure) to render the visual and audio output of an interface agent. Following is an overview of the processes:

3. Language Definition

The followings sections detail the individual element defined in this XML language. (In the examples, lines begin with "<!--" are comments)

3.1 Speech and Audio Elements


speak

Description: the contained text is going to be spoken by the agent. 
Attributes: none.
Properties: can contain all other elements, except itself.
Example: 

<speak>
    Hello, world!
</speak>


say_as

Description: element used to give more control over how the contained text is going to be spoken than the speak element.
Properties: can contain all other elements, except itself.
Attributes: 

Example:

<say_as pitch = "+90%" range = "low" rate = "medium" volume = "+100%">
    It is not my fault!
</say_as>
<!-- set the pitch as 90% higher than default value, low variability, and double the volume; basically scream out "it is not my fault!" --> 


b

Description: element used to request that the contained text to be spoken with emphasis
Attributes: none.
Properties: can not contain any other speech elements or itself.
Example: 

<speak>
    This is <b>fantastic</b>!
</speak>


i

Description: element used to request that the contained text to be spoken softly.
Attributes: none.
Properties: can not contain any other elements or itself.
Example: 

<speak>
    <i>It is my fault.<i>
</speak>


audio

Description: element used to insert recorded audio files.
Attributes: The required attribute is source, which is the URI of a file. 
Properties: can not contain any other speech elements or itself.
Example:

<speak>
    Today is John's birthday. 
</speak>
<audio source = "happy_birthday.wav"/>
<!-- playback the song of "happy birthday" -->


pause

Description: element used to insert a pause in the utterance.
Attributes: duration to specify the length of the pause, a value in seconds or a descriptive value of "long", "short", "medium". If omitted, default setting will be used.
Properties: can not contain any other elements or itself.
Example:

<speak>
    I see. <pause duration="short" /> That is the problem.
</speak>


listen

Description: element used to change turn of interaction, the script will pause until get continue signal from the main program.
Attributes: none
Properties: can not contain any other elements or itself.
Example:

<listen/>


3.2 Emotion, Expression and Gesture Elements


emotion

Description: element used to define the current emotional state, it will affect the voice and the face. The combined effect of emotion and mood is illustrated in figure1.
Properties: can wrap speech elements.
Attributes: 

Example: 

<emotion type = "happy" duration = "10" strength="+100%" in = "fast" out = "slow">
    <speak>
        This is <b>fantastic</b>!
    </speak>
</emotion>
<!-- A fast-in-slow-out emotion of very "happy", which lasts 10 seconds.->


mood

Description: While emotion is defined as short and intense, mood is longer and with a lower intensity. It works like a background emotional state, when no strong emotions are occurring. The way mood combined with emotion is as follows. When there are no emotions occurring, the parameters  defined for the mood will be used with a slight add-on perturbation. Then when an emotion occurs, it switch to the emotion according the emotion's parameters, and gradually come back to the mood parameters (see figure 1).


Figure1

Properties: usually used outside of sequences of actions..
Attributes: 

Example:

<mood type = "sad" strength = "low">
    <emotion type = "happy" duration = "5" strength="high">
        <speak>
            This is <b>great</b>.
        </speak>
    </emotion>
    ...
</mood>
<!-- see figure 1 as the result -->


expression

Description: element used to generate a particular expression. For example, use shrug to express aloofness, indifference, or uncertainty.
Attributes: 

Properties: can only contain speech elements or none.
Example:

<speak>
    You are absolutely right!
     <expression type = "agree" duration = "3" strength = "high"/>
</speak>
<!-- agree with nod -->

...

<expression type = "shrug" strength = "+50%" >
    <speak>
        I don't know.
    </speak>
</expression>
<!-- really means I don't know -->


movement

Description: element used to directly control eyes, eyebrows, mouth and face movement.. 
Attributes: 

Properties: can contains other element and itself to synchronize different movements.
Example:

<movement object = "face" speed = "medium" x = "-30">
    <movement object = "eyes" speed = "medium" x = "-40" stay = "5" />
    <speak> Maybe it is my left side. </speak>
</movement>
<!-- turn and look left -->


3.3 Other Elements


iaml

Description: Root element that encapsulates all other xml elements.
Attributes: the optional attribute is kb, which is the URI of a file, which is the knowledge base.
Properties: Root node, can only occur once.
Example:

<iaml>
    <!-- the body here-->
</iaml>


setactiveagent

Description: element that switch the output of xml stream to a different agent.
Attributes: required to specify the name of the agent
Properties: Can occur anywhere in the document if there are multiple agents available. Can not contain any other elements or itself.
Example:

<setactiveagent name = "Genie" />
<speak>I'm Genie</speak>
<setactiveagent name = "Robby" />
<speak>I'm Robby, the robot</speak>


setagent

Description: element used to change properties of the agent, such as visibility.
Attributes: 

Properties: Can occur anywhere in the document. Can not contain any other elements or itself.
Example:

<setagent name = "Genie" property = "visibility" value = "false">
 <!-- hide the Genie character on the screen -->


act

Description: element used to refer a macro in the knowledge base.
Attributes: None
Properties: Can occur anywhere in the document as text.
Example:

<act>
    GREETINGS
</act>
<!--greeting the user, calling his name according to the knowledge base-->


mark

Description: element used to place to a marker into the output stream for notification of an event from the main program.
Attributes: 

Properties: Can occur anywhere in the document. Can not contain any other elements or itself.
Example:

<speak>
    Take this <mark name = "this" /> and drop it <mark name= "here" /> here.
</speak>


getvar

Description: element used to get values of variables in the main program.
Attributes: required to specify the name of the variable.
Properties: Can occur anywhere in the document as text.
Example:

<speak>
    Hi, <gervar name = "UserName">!
</speak>
<!--greeting the user, calling his name-->


setdef

Description: element used to set default values for elements' parameters.
Attributes: 

Properties: Can occur anywhere in the document. Can not contain any other elements or itself.
Example:

<setdef name = "expression" parameter = "in" value = "0.2">
<setdef name = "expression" parameter = "out" value = "0.2">
 <!-- set the animation of any facial expression to be slow-in and slow-out as default -->


4. Integration with programming language

Example:

...
Dim Genie as Object
Set Genie = CreateAgent (MSAgentControl, "msagent.cfg")
ExecuteScript ("sample.aiml")
InsertScript ("<speak> I'm Genie! </speak>")
...


Last modified on 9/12/2002