A key goal was to minimize the number of modifications required to the released ns source code and to insure that all existing ns simulations would still run properly when used with our modified ns. Minimizing the number of changes to ns allows the parallel simulator to readily take advantage of new, improved versions of ns as they become available. Any new or revised ns syntax should be directly related to the parallelization of the simulation, and does not affect ns users who are not using PDNS.
The PADS research group at Georgia Tech has previously developed an extensive library of support software for implementing parallel and distributed simulations, known as RTIKIT. The RTIKIT sofware has support for global virtual time management, group data communications, and message buffer management. The RTIKIT software has support for both Myrinet and TCP/IP networks, and runs on a variety of platforms. By using the RTIKIT software for the parallelization of ns, we were able to rapidly modify the main event processing loop of ns to support the distributed time management functions needed to insure that no unsafe event is ever processed by any federate.
The modifications needed to ns can be broadly classified in two major categories, the modifications to the ns event processing infrastructure, and extensions to the ns TCL script syntax for describing simulations. Each of those categories are described in detail below.
$ns use-scheduler Heap
which specifies that the heap based scheduler should be used. We developed a new event scheduler known as Scheduler/RTI, which is specified by the ns user as follows:
$ns use-scheduler RTI
The Scheduler/RTI uses the time management functions of the RTIKIT to insure that local simulation time advances do not allow for the processing of unsafe events. The Scheduler/RTI also contains code to process events received by a federate which were generated by another federate, and places those new events in the proper location in the event list.
Also related to event scheduling is the transmission of events from one federate to another in the distributed simulation. This is handled by a newly created ns agent called Agent/RTI. The Agent/RTI is responsible for determining that a received event is destined for a remote federate, preparing a message containing the complete event information, and forwarding that to the remote federate using the RTIKIT MCAST functions. The RTI Agents are automatically inserted on any ns node that has a link to a remote simulator.
set r0 [$ns node]
set r2 [$ns node]
$ns duplex-link $r0 $r2 1.5mb 10ms DropTail
But when we decide to run the simulation in a distributed fashion on Simulators A and B as shown below, the definition of the duplex-link is problematic. In simulator A there is no notion of node r2, and in simulator B there is no notion of node r0. We solve this problem by extending the ns syntax to include the specification of a remote link, called an rlink. With our extended ns syntax, the simulated links which are across federates are defined using an rlink command, just specifying the local endpoint, and identifying it with an IP Address. The other end of the simulated link is defined in simulator B, and is also assigned an IP address. At runtime, remote links with matching network addresses are logically connected, and simulated packets which leave simulator A on the rlink are delivered to the corresponding rlink in simulator B. Details on how this is done can be seen in the example script linked below.
Defining data flows in ns is done similarly, defining the two endpoints of the data flow by name. We have the same problem when the remote endpoint is located on another federate, and we solve it in a similar fashion, allowing a remote connection, called an rconnect. With an rconnect command, the remote endpoint of a data connection is specified by IP Address and port number, rather than by ns node and agent name.
$linkname set-ipaddr address mask
Where: linkname is the ns object for the link being assigned
address is the address being assigned, either in dotted
notation, or a 32 bit value.
mask is the corresponding netmask, either in dotted
notation or a 32 bit value. The netmask value is not used in
the current pdns implementation, but is included for completeness.
$nodename bind $agent portnum
Where: nodename is the ns object for the node where the agent is
being bound.
agent is the ns object for the agent being bound
portnum is the port number desired. If zero is specified,
then the next available port number is assigned.
$nodename rlink linkspeed linkdelay qtype ipaddr addrmask
Where: nodename is the ns object for the node where the remote
link is being defined.
linkspeed is the speed of the link
linkdelay is the propogation delay for the link
qtype is the associated queueing discipline for the link
ipaddr is the IP address of the local end of the link. The
other end of the link must have an IP address with an identical
network address (the IP address AND'ed with the network mask).
Also, one end of the rlink must have a host value of ".1".
addrmask is the mask value for the network portion of the
IP address.
$ns rqueue $node ipaddr
Where: $ns is the Simulator object
$node is the node containing the remote link
$ipaddr is the IP Address of the remote link
$node get-rqueue ipaddr
Where: $node is the node containing the remote link
$ipaddr is the IP Address of the remote link
First, un-zip and un-tar the RTIKIT software:
gunzip -c ~/fdk1.0d.tar.gz | tar -xvf -
Then you need to un-zip and un-tar the ns-added-2.1b5.tar.gz file. Do this as follows:
cd ~/ns-allinone-2.1b5
gunzip -c ~/ns-added-2.1b5.tar.gz | tar -xvf -
Next you need to run the rtidiffs script, which will apply some differences to the released ns code:
cd ns-2.1b5
./rtidiffs
Next, edit ~/ns-allinone-2.1b5/ns-2.1b5/Makefile.in, and set the value of the RTIKIT macro to point to the directory where you installed the RTIKIT. Then just configure and install ns normally. The resulting binaries for the Parallel/Distributed ns will be ~/ns-allinone-2.1b5/ns-2.1b5/pdns.
The pdns software uses the RTIKIT Session Manager to coordinate the startup synchronization of the distributed simulation. The session manager uses two environment variables, NODEINFO and SESSIONNAME to coordinate the creation of the inter-process communication sockets. These environment variables are described in detail on the RTIKIT usage documentation. Use sample scripts example1a.tcl, example1b.tcl, and the runsimp script to test out your installation. The runsimp script will also need to be edited to specify the correct hostnames and paths to the scripts and binaries. The example scripts implement the simple distributed simulation shown in the figure below.
Please email George Riley with any questions or comments.