NAME
          cthread_intro - introduction to programming with cthreads
          library

     LIBRARY
          C Threads Library (libcthreads.a)

     SYNOPSIS
          #include <cthread.h>

     DESCRIPTION
          Many experimental and commercial operating systems provide
          support for concurrent programing. The most popular mechan-
          ism for this is some provision for allowing multiple light-
          weight threads within a single address space, used from
          within a single program.  Cthreads is a library that pro-
          vides such facilities for C programs.  Cthreads is designed
          to be used on multi-processor architectures, but it can be
          used on a single processor.  On uniprocessors, the use of
          multiple processes simulate the concurrent execution that
          might be achieved on a multiprocessor.

          A thread is defined as a  single sequential flow of control.
          In a high-level language, users program a thread using pro-
          cedures, where procedure calls follow the traditional stack
          discipline.  Within a single thread, there is at any instant
          a single point of execution. Having multiple threads in a
          program means that at any instant the program has multiple
          points of execution, one in each of its threads. The pro-
          grammer can mostly view the threads as executing simultane-
          ously, as if the computer has as many processors as there
          are threads. The programmer is required to decide when and
          where to create multiple threads.

          The term "lightweight" in this context means that thread
          scheduling within Cthreads takes place independently of
          thread or process scheduling in the underlying operating
          system.  Lightweight threads are relatively cheap in terms
          of overhead because they can perform operations without
          requiring the involvement of the operating system.

          Having the threads execute within a "single address space"
          means that the threads are able to read and write (at least
          some of) the same memory locations. In a high-level
          language, this usually corresponds to the fact that some
          variables are shared among all the threads (see the section
          on memory model) of the program. Each thread executes on a
          separate call stack with its own separate local variables.
          The programmer is responsible for using the synchronization
          mechanisms of the thread facility to ensure that the shared
          memory is accessed in a manner that will keep them con-
          sistent.

     PROGRAMMING USING CTHREADS:
          The Cthreads library is a run-time library that provides a C
          language interface to manipulate threads of control. The
          following paragraphs provide a brief introduction of how to
          use the library to develop concurrent programs.

        Initialization:
          The Cthreads library must be initialized before any cthreads
          calls (except for cthread_configure() and
          cthread_parse_args()) can be made.  A call to cthread_init()
          with appropriate parameters (see cthread_init(3)) initial-
          izes the library for a specified number of logical proces-
          sors. On multiprocessor targets like the KSR, these logical
          processors correspond to physical processors.  On uniproces-
          sors like the SPARC, processors are simulated with Unix
          processes. Among other things, cthread_init() initializes
          the logical processors, binds them to available physical
          processors, creates per-processor (logical processor) data
          structures (such as run queues, free lists etc.), pre-
          allocates a number of threads, creates a shared memory arena
          to be used by the memory manager for dynamic memory alloca-
          tion (see memory model below), and starts the first thread
          to execute the initial function (as specified as an argument
          of cthread_init()).  Parameters such as the number of
          threads created, the size of the shared memory region, etc.
          can be configured before the call to cthread_init() with
          cthread_configure() and cthread_parse_args().

          The Cthreads library assigns a logical processor number from
          0..n-1 to each logical processor. The initial thread runs in
          logical processor 0.  The variable "current_processor"
          specifies the logical processor number of the executing
          thread. The variable "num_of_procs" points to the total
          number of logical processors.

          Operations:
          cthread_init()           Initializes the cthreads library.
          cthread_parse_args()     Parses command line arguments to
                                   change cthread configuration param-
                                   eters.
          cthread_configure()      (Outdated) Set cthread configura-
                                   tion parameters.
          current_processor        This variable specifies the logical
                                   processor number of the process on
                                   which the current thread is execut-
                                   ing.
          num_of_procs             This variable points to total
                                   number of logical processors.
        Memory Model:
          There can be three kinds of memory in a Cthreads application
          depending on how the memory is allocated -- memory local to
          a thread, memory local to a logical processor, and memory
          shared between all threads and processors.

          Memory local to a thread:
               Variables declared within a C subprogram are always
               stack allocated and are thus local to a single thread.
               cthread_set_data() provides an additional mechanism for
               associating a single piece of data with a particular
               thread.

          Memory local to a processor:
               On some Cthreads implementations, such as on the KSR,
               all memory is either thread-local or shared.  However,
               because the SPARC implementation uses Unix processes to
               simulate multiple processors, all C "global" variables
               and memory allocated with the normal Unix malloc() call
               are processor-local.  This will be described in more
               detail below, but we wish to point out that when writ-
               ing a program that should run on multiple platforms, it
               is safer to assume a SPARC-style model than a KSR-style
               model.  We recommend writing for a SPARC-style machine
               for portable programs.

          Shared memory:
               Only memory allocated using memory_alloc() is
               guaranteed to be shared between all logical processors
               running a Cthreads program.

          As mentioned earlier, on the SPARC target logical processors
          are simulated using Unix processes.  The semantics of Unix
          process creation imply that the logical processors do not
          share their data segment.  As result, each logical processor
          has its own copy of all C "global" variables.  Since the
          copy occurs during cthread_init() each processor will ini-
          tially have the same value in all the global variables, but
          thereafter the copies may diverge.  Threads executing in the
          same logical processor will share any global variable but
          threads executing in different logical processors may see
          different values.  This means that programs that use C glo-
          bal variables must be written carefully.  It is always safe
          to read global values that were set before cthread_init().
          It is also possible to use cthread_publish() to write the
          same value to all copies of a particular C global variable.
          In particular, it is quite common to allocate shared memory
          with memory_alloc() and to store a pointer to that memory in
          a C global variable.  Then cthread_publish() can be used to
          make all the copies of that C global variable point to the
          allocated shared memory.  See cthread_publish(3) for more
          details.

          On the SPARC target, memory to be shared between logical
          processors is allocated using shmget(3).  Because Unix lim-
          its the size of these shared segments, there are limits on
          the amount of shared memory that Cthreads can make avail-
          able.  The exact limit on shared memory depends upon parame-
          ter values when the Unix kernel was built.  (Note that if
          only one logical processor is specified, the current SPARC
          implementation recognizes that it need not use shmget() and
          allows a larger "shared" memory region for memory_alloc().)

          Again on the SPARC target, It is possible, but not recom-
          mended, to use malloc() instead of memory_alloc() under some
          circumstances.  Memory obtained with malloc() prior to
          cthread_init() behaves much like C global variables do.
          Each processor has its own copy with values as they existed
          at the time of cthread_init().  After cthread_init(), memory
          obtained via malloc() will be processor local.  Pointers to
          malloc'd memory should not be passed between processors, but
          can be used safely on a single processor.


          Operations:
          memory_alloc()           dynamically allocates a specified
                                   size memory block from a shared
                                   arena.
          memory_free()            frees allocated memory blocks.
          cthread_publish()        publishes a pointer to all the log-
                                   ical processors.

        Thread Creation:
          When a C program starts, it contains a single thread of con-
          trol, the one executing main(). The thread of control is an
          active entity, moving from statement to statement, calling
          and returning from procedures.  Forking a new thread of con-
          trol (see cthread_fork(3)) is similar to calling a pro-
          cedure, except that the caller does not wait for the pro-
          cedure to return. Instead, the caller continues to execute
          in parallel with the execution of the procedure in the newly
          forked thread. At some later time, the caller may rendezvous
          with the thread and retrieve its result, if any, by means of
          a cthread_join() operation, or the caller may detach the
          newly created thread to assert that no thread will ever be
          interested in joining it (see cthread_detach(3)).  Each
          thread is represented in the library by a thread control
          block (TCB) which contains thread-specific information such
          as a program counter and the address of the thread's stack.
          The function cthread_fork is implemented by the following
          three steps: 1. get a TCB  from the free list of the logical
          processor which contains the pre-allocated thread control
          blocks, 2. initialize the TCB (especially, the program
          counter and the stack pointer), 3. enqueue the TCB in the
          processor run queue.

          A thread terminates either when it returns from the top-
          level procedure it was executing or when an explicit call to
          cthread_exit() is made.  If the exiting thread has not been
          detached, it remains in limbo until another thread either
          joins it or detaches it. If the thread is detached, no join
          is necessary. Every thread must be joined or detached
          exactly once.

          Operations:
          cthread_fork()           creates a new thread to execute a
                                   user-specified function
          cthread_exit()           exits the current thread
          cthread_join()           joins another thread
          cthread_detach()         detaches the current thread so that
                                   it can not be joined by any other
                                   thread.
          cthread_self()           returns the thread-id of the cal-
                                   ling thread.

        Scheduling:
          The Cthreads library implements a non-preemptive FIFO
          scheduler per logical processor. Each logical processor con-
          tains a run queue. An executable thread is placed in the run
          queue. An executing thread releases the processor only when
          it exits, performs an operation that causes it to block, or
          when it explicitly calls cthread_yield().  The function
          cthread_yield() implements voluntary surrender of the pro-
          cessor by placing the current at the end of the run queue
          and dispatches the next thread in the queue.

          Operations:
          cthread_yield()          yields the processor to the next
                                   thread in the queue.

        Synchronization:
          The Cthreads library supports a barrier function to syn-
          chronize all the application threads (see barrier(3)) and
          two kinds of synchronization variables: mutex variables and
          condition variables. A programmer must explicitly allocate
          and initialize any synchronization variables used in the
          program (from the shared memory) using allocation calls pro-
          vided by the library (Please refer to manual pages for
          mutex_alloc(), mutex_init(), mutex_free(),
          condition_alloc(), condition_init(), condition_free()).
          Note that if pointers to the allocated mutexes and condi-
          tions are stored in global (static) memory, those pointers
          must be published with cthread_publish() before they are
          used.

          Mutex Variables:
               Mutex variables implement the basic tool that enables
               threads to cooperate on access to shared variables. A
               mutex is normally used to ensure that a set of actions
               on a group of variables can be made atomic relative to
               any other thread's actions on these variables. Such
               mutual exclusion is implemented using the procedures
               mutex_lock() and mutex_unlock().  In their typical use,
               the programmer establishes a logical relationship
               between a mutex "m" and a set of program variables and
               ensures that all read and writes of those shared vari-
               ables occur only when bracketed by mutex_lock(m) and
               mutex_unlock(m) calls.   The code between these calls
               which operates upon the shared variables is known as a
               critical section. The following piece of code imple-
               ments a critical section:



                       mutex_t m;
                       ...
                       mutex_lock(m);
                            /* critical section */
                       mutex_unlock(m);

               The function mutex_lock() implements a variation of
               Anderson's back-off spin lock. If the mutex "m" is
               free, mutex_lock(m) locks the mutex and the thread con-
               tinues. If more than one thread attempts to lock the
               mutex simultaneously, the Cthreads library guarantees
               that only one will succeed; the rest will wait. The
               waiting threads are removed from the top of the run
               queue and are placed at the end of the appropriate run
               queues (the run queues associated with the logical pro-
               cessors on which the waiting threads were executing).
               When those threads reach the head of their respective
               run queues they will repeat the lock attempt.  The
               mutex_unlock(m) procedure unlocks the mutex m, giving
               other threads a chance to lock it.

               Operations:
               mutex_lock() locks a mutex.
               mutex_unlock() unlocks a mutex.
               mutex_alloc() allocates a mutex variable.
               mutex_free() frees a mutex variable.
               mutex_init() initialize a mutex lock.
               mutex_clear() finalize a mutex lock.

          Condition Variables:
               Condition variables make it possible for a thread to
               suspend its execution while awaiting an action by some
               other thread.  In and of itself, a condition variable
               has no "value", particularly not in the sense that some
               abstract program-related "condition" is true or false.
               They are called "condition" variables because in their
               most common use the programmer establishes a logical
               relationship between a condition variable, a mutex and
               some set of program variables which determine a "condi-
               tion" that threads might wait for.  For example, in
               implementing readers-writers style synchronization, a
               reader thread must wait until there are no writers in
               the system before proceeding.  In this case, a variable
               "num_writers" might constitute the shared program vari-
               able that can be tested to determine whether or not the
               condition of no writers being present exists. The
               beginning of the reader portion of the code implement-
               ing this synchronization might look something like
               this:

                    mutex_t        rw_count_lock;
                    condition_t    writers_is_zero;
                    int       num_readers;
                    int       num_writers;

                    ...
                    mutex_lock(rw_count_lock)
                    /* wait until num_writers is zero */
                    while ( num_writers > 0 ) {
                        condition_wait(writers_is_zero, rw_count_lock);
                    }
                    num_readers++;
                    mutex_unlock(rw_count_lock);
                    /* now it is safe to read */


               While the end of writer portion might look like:
                    /* code involving writes */
                    mutex_lock(rw_count_lock);
                    num_writers--;
                    if ( num_writers == 0 )
                         condition_broadcast(writers_is_zero);
                    mutex_unlock(rw_count_lock);

               This is not the complete code for the readers-writers
               problem, but just the portion where beginning readers
               might sleep and completing writers wake them up. This
               is a very common style of using condition variables.
               Note that access to the state variables (here
               num_readers and num_writers) is shared by multiple
               threads and must therefore be protected by a mutex
               (here rw_count_lock).  That same mutex is specified in
               the condition_wait() operation.  Condition_wait() actu-
               ally unlocks the mutex while this thread is waiting and
               relocks it before letting the thread continue.  (If it
               did not, no other thread could ever gain the mutex to
               modify the variables!)  Notice too that the
               condition_wait() is performed inside a while loop.
               This is also common because it is always possible that
               the abstract "condition" we're waiting for will change
               between the time that the condition variable gets sig-
               naled and the time the waiting thread actually gets an
               opportunity to run again.  In this event the thread
               must be prepared to wait again.

               One very important thing to remember is that the rela-
               tionship between condition variables and abstract "con-
               ditions" in a program (like there being no writers
               present) is strictly a logical one established by the
               programmer.  The condition variables themselves just
               provide a convenient mechanism for a thread to wait to
               be signaled.  They assume no values and depend entirely
               upon the programmer to write code that determines when
               signaling is necessary and when it is really time to
               continue, presumably based upon some abstract "condi-
               tion".

               The operations on a condition variable are
               condition_wait() which causes the calling thread to
               suspend, and condition_signal() and
               condition_broadcast() which can cause waiting threads
               to resume.  Condition_signal() is used to unblock one
               waiting thread; condition_broadcast() is used to
               unblock all the waiting threads. Which one is appropri-
               ate depends on the circumstances.  Using
               condition_signal() is preferable when only one blocked
               thread can benefit from the change.
               Condition_broadcast() is necessary if multiple threads
               should resume.  (In the readers-writers example,
               condition_broadcast() is used because all waiting
               readers could start simultaneously.)

          The Cthreads library implements a condition variable using
          some state variables and a thread queue. A condition_wait
          call atomically suspends the calling thread, removes the
          calling thread from the run queue, and places it in the
          queue associated with the condition variable. A
          condition_signal() removes a thread, if there are any, from
          the condition queue and places it in the run queue. Note
          that a condition_signal() puts the thread in the run queue
          but doesn't guarantee that the thread will run immediately
          (it may run immediately if the processor run queue is empty
          and if it can obtain a lock on the mutex). A
          condition_broadcast() removes all the waiting threads from
          the condition queue and places them in the appropriate pro-
          cessor run queues.  However, because the semantics of
          condition_wait() require the lock on the mutex to be reesta-
          blished by each particular thread before it can run, at most
          one one of the threads awakened by a condition_broadcast()
          will run immediately.  The others will continue as they have
          the opportunity to regain the lock.

          Operations:
          condition_wait()         wait on a condition variable for an
                                   event.
          condition_signal()       wakes up a thread waiting on a con-
                                   ditional variable.
          condition_broadcast()    wakes up all the threads waiting on
                                   a condition variable.
          condition_alloc()        allocates a condition variable.
          condition_free()         free a condition variable.
          condition_init()         initializes a condition variable.

        Thread debugging:
          There are a couple of mechanisms available for debugging
          Cthreads programs.  In particular, if the bug is related to
          the use of Cthreads itself (i.e.  thread deadlock, missed
          synchronization, etc.) then it is possible to turn on the
          printing of informational messages associated with almost
          every Cthread call.  These printouts trace most changes in
          thread status.  For example these are sample printouts:
            Thread "func1" (0xf76f5ac4) attempting lock on mutex "lock" (0xf76f50ec) at test3.c(77)
            Thread "func1" (0xf76f5ac4) granted lock on mutex "lock" (0xf76f50ec) at test3.c(77)
            Thread "func1" (0xf76f5ac4) waiting on condition "ready" (0xf76f55e0) at test3.c(79)

          The printouts contain line and file information that relates
          back to the application source.  While the library will sup-
          ply simple default names for all created conditions, mutexes
          and threads, the programmer may need to provide more
          descriptive names with cthread_set_name(), mutex_set_name(),
          or condition_set_name().  These printouts can be quite a
          valuable aid for debugging synchronization problems.  See
          cthread_parse_args(3) for details on enabling them.

          Also, on SPARC targets, it is possible in some circumstances
          to use dbx, gdb, ups or other traditional Unix debuggers.
          You can do this only if you use just one (1) logical proces-
          sor.  The Cthreads library manipulates stacks in ways that
          most debuggers aren't prepared to handle, so you should be
          prepared for some confusion on the part of the debugger.
          Usually this isn't much of a problem if you can limit your
          debugging efforts to a single thread.  It's particularly
          easy if all you need to do is find out where, for example, a
          segmentation fault is occurring.  Just compile with "-g" and
          invoke your debugger on the Cthreads program as you would
          for any other program.

          If all else fails, there's always printf().  Unfortunately,
          on some machines, concurrent printf's can trash each other.
          To address this, Cthreads provides "atomic" versions of
          printf and fprintf.  See aprintf(3) for more information.


     SEE ALSO
          aprintf(3), barrier(3), condition_alloc(3),
          condition_free(3), condition_broadcast(3),
          condition_signal(3), condition_wait(3),
          condition_set_name(3), cthread_configure(3),
          cthread_detach(3), cthread_exit(3), cthread_fork(3),
          cthread_thread_alloc(3), cthread_thread_schedule(3),
          cthread_init(3), cthread_join(3), cthread_parse_args(3),
          cthread_perror(3), cthread_publish(3), cthread_self(3),
          cthread_set_data(3), cthread_set_name(3),
          cthread_set_sched_info(3), cthread_yield(3),
          current_processor(3), memory_alloc(3), memory_free(3),
          mutex_alloc(3), mutex_free(3), mutex_lock(3),
          mutex_unlock(3), mutex_set_name(3), num_of_procs(3)

     AUTHOR
          Cthreads was written and maintained by many people.
          This man page was written by Bodhi Mukherjee and Greg
          Eisenhauer.