NAME
cthread_intro - introduction to programming with cthreads
library
LIBRARY
C Threads Library (libcthreads.a)
SYNOPSIS
#include <cthread.h>
DESCRIPTION
Many experimental and commercial operating systems provide
support for concurrent programing. The most popular mechan-
ism for this is some provision for allowing multiple light-
weight threads within a single address space, used from
within a single program. Cthreads is a library that pro-
vides such facilities for C programs. Cthreads is designed
to be used on multi-processor architectures, but it can be
used on a single processor. On uniprocessors, the use of
multiple processes simulate the concurrent execution that
might be achieved on a multiprocessor.
A thread is defined as a single sequential flow of control.
In a high-level language, users program a thread using pro-
cedures, where procedure calls follow the traditional stack
discipline. Within a single thread, there is at any instant
a single point of execution. Having multiple threads in a
program means that at any instant the program has multiple
points of execution, one in each of its threads. The pro-
grammer can mostly view the threads as executing simultane-
ously, as if the computer has as many processors as there
are threads. The programmer is required to decide when and
where to create multiple threads.
The term "lightweight" in this context means that thread
scheduling within Cthreads takes place independently of
thread or process scheduling in the underlying operating
system. Lightweight threads are relatively cheap in terms
of overhead because they can perform operations without
requiring the involvement of the operating system.
Having the threads execute within a "single address space"
means that the threads are able to read and write (at least
some of) the same memory locations. In a high-level
language, this usually corresponds to the fact that some
variables are shared among all the threads (see the section
on memory model) of the program. Each thread executes on a
separate call stack with its own separate local variables.
The programmer is responsible for using the synchronization
mechanisms of the thread facility to ensure that the shared
memory is accessed in a manner that will keep them con-
sistent.
PROGRAMMING USING CTHREADS:
The Cthreads library is a run-time library that provides a C
language interface to manipulate threads of control. The
following paragraphs provide a brief introduction of how to
use the library to develop concurrent programs.
Initialization:
The Cthreads library must be initialized before any cthreads
calls (except for cthread_configure() and
cthread_parse_args()) can be made. A call to cthread_init()
with appropriate parameters (see cthread_init(3)) initial-
izes the library for a specified number of logical proces-
sors. On multiprocessor targets like the KSR, these logical
processors correspond to physical processors. On uniproces-
sors like the SPARC, processors are simulated with Unix
processes. Among other things, cthread_init() initializes
the logical processors, binds them to available physical
processors, creates per-processor (logical processor) data
structures (such as run queues, free lists etc.), pre-
allocates a number of threads, creates a shared memory arena
to be used by the memory manager for dynamic memory alloca-
tion (see memory model below), and starts the first thread
to execute the initial function (as specified as an argument
of cthread_init()). Parameters such as the number of
threads created, the size of the shared memory region, etc.
can be configured before the call to cthread_init() with
cthread_configure() and cthread_parse_args().
The Cthreads library assigns a logical processor number from
0..n-1 to each logical processor. The initial thread runs in
logical processor 0. The variable "current_processor"
specifies the logical processor number of the executing
thread. The variable "num_of_procs" points to the total
number of logical processors.
Operations:
cthread_init() Initializes the cthreads library.
cthread_parse_args() Parses command line arguments to
change cthread configuration param-
eters.
cthread_configure() (Outdated) Set cthread configura-
tion parameters.
current_processor This variable specifies the logical
processor number of the process on
which the current thread is execut-
ing.
num_of_procs This variable points to total
number of logical processors.
Memory Model:
There can be three kinds of memory in a Cthreads application
depending on how the memory is allocated -- memory local to
a thread, memory local to a logical processor, and memory
shared between all threads and processors.
Memory local to a thread:
Variables declared within a C subprogram are always
stack allocated and are thus local to a single thread.
cthread_set_data() provides an additional mechanism for
associating a single piece of data with a particular
thread.
Memory local to a processor:
On some Cthreads implementations, such as on the KSR,
all memory is either thread-local or shared. However,
because the SPARC implementation uses Unix processes to
simulate multiple processors, all C "global" variables
and memory allocated with the normal Unix malloc() call
are processor-local. This will be described in more
detail below, but we wish to point out that when writ-
ing a program that should run on multiple platforms, it
is safer to assume a SPARC-style model than a KSR-style
model. We recommend writing for a SPARC-style machine
for portable programs.
Shared memory:
Only memory allocated using memory_alloc() is
guaranteed to be shared between all logical processors
running a Cthreads program.
As mentioned earlier, on the SPARC target logical processors
are simulated using Unix processes. The semantics of Unix
process creation imply that the logical processors do not
share their data segment. As result, each logical processor
has its own copy of all C "global" variables. Since the
copy occurs during cthread_init() each processor will ini-
tially have the same value in all the global variables, but
thereafter the copies may diverge. Threads executing in the
same logical processor will share any global variable but
threads executing in different logical processors may see
different values. This means that programs that use C glo-
bal variables must be written carefully. It is always safe
to read global values that were set before cthread_init().
It is also possible to use cthread_publish() to write the
same value to all copies of a particular C global variable.
In particular, it is quite common to allocate shared memory
with memory_alloc() and to store a pointer to that memory in
a C global variable. Then cthread_publish() can be used to
make all the copies of that C global variable point to the
allocated shared memory. See cthread_publish(3) for more
details.
On the SPARC target, memory to be shared between logical
processors is allocated using shmget(3). Because Unix lim-
its the size of these shared segments, there are limits on
the amount of shared memory that Cthreads can make avail-
able. The exact limit on shared memory depends upon parame-
ter values when the Unix kernel was built. (Note that if
only one logical processor is specified, the current SPARC
implementation recognizes that it need not use shmget() and
allows a larger "shared" memory region for memory_alloc().)
Again on the SPARC target, It is possible, but not recom-
mended, to use malloc() instead of memory_alloc() under some
circumstances. Memory obtained with malloc() prior to
cthread_init() behaves much like C global variables do.
Each processor has its own copy with values as they existed
at the time of cthread_init(). After cthread_init(), memory
obtained via malloc() will be processor local. Pointers to
malloc'd memory should not be passed between processors, but
can be used safely on a single processor.
Operations:
memory_alloc() dynamically allocates a specified
size memory block from a shared
arena.
memory_free() frees allocated memory blocks.
cthread_publish() publishes a pointer to all the log-
ical processors.
Thread Creation:
When a C program starts, it contains a single thread of con-
trol, the one executing main(). The thread of control is an
active entity, moving from statement to statement, calling
and returning from procedures. Forking a new thread of con-
trol (see cthread_fork(3)) is similar to calling a pro-
cedure, except that the caller does not wait for the pro-
cedure to return. Instead, the caller continues to execute
in parallel with the execution of the procedure in the newly
forked thread. At some later time, the caller may rendezvous
with the thread and retrieve its result, if any, by means of
a cthread_join() operation, or the caller may detach the
newly created thread to assert that no thread will ever be
interested in joining it (see cthread_detach(3)). Each
thread is represented in the library by a thread control
block (TCB) which contains thread-specific information such
as a program counter and the address of the thread's stack.
The function cthread_fork is implemented by the following
three steps: 1. get a TCB from the free list of the logical
processor which contains the pre-allocated thread control
blocks, 2. initialize the TCB (especially, the program
counter and the stack pointer), 3. enqueue the TCB in the
processor run queue.
A thread terminates either when it returns from the top-
level procedure it was executing or when an explicit call to
cthread_exit() is made. If the exiting thread has not been
detached, it remains in limbo until another thread either
joins it or detaches it. If the thread is detached, no join
is necessary. Every thread must be joined or detached
exactly once.
Operations:
cthread_fork() creates a new thread to execute a
user-specified function
cthread_exit() exits the current thread
cthread_join() joins another thread
cthread_detach() detaches the current thread so that
it can not be joined by any other
thread.
cthread_self() returns the thread-id of the cal-
ling thread.
Scheduling:
The Cthreads library implements a non-preemptive FIFO
scheduler per logical processor. Each logical processor con-
tains a run queue. An executable thread is placed in the run
queue. An executing thread releases the processor only when
it exits, performs an operation that causes it to block, or
when it explicitly calls cthread_yield(). The function
cthread_yield() implements voluntary surrender of the pro-
cessor by placing the current at the end of the run queue
and dispatches the next thread in the queue.
Operations:
cthread_yield() yields the processor to the next
thread in the queue.
Synchronization:
The Cthreads library supports a barrier function to syn-
chronize all the application threads (see barrier(3)) and
two kinds of synchronization variables: mutex variables and
condition variables. A programmer must explicitly allocate
and initialize any synchronization variables used in the
program (from the shared memory) using allocation calls pro-
vided by the library (Please refer to manual pages for
mutex_alloc(), mutex_init(), mutex_free(),
condition_alloc(), condition_init(), condition_free()).
Note that if pointers to the allocated mutexes and condi-
tions are stored in global (static) memory, those pointers
must be published with cthread_publish() before they are
used.
Mutex Variables:
Mutex variables implement the basic tool that enables
threads to cooperate on access to shared variables. A
mutex is normally used to ensure that a set of actions
on a group of variables can be made atomic relative to
any other thread's actions on these variables. Such
mutual exclusion is implemented using the procedures
mutex_lock() and mutex_unlock(). In their typical use,
the programmer establishes a logical relationship
between a mutex "m" and a set of program variables and
ensures that all read and writes of those shared vari-
ables occur only when bracketed by mutex_lock(m) and
mutex_unlock(m) calls. The code between these calls
which operates upon the shared variables is known as a
critical section. The following piece of code imple-
ments a critical section:
mutex_t m;
...
mutex_lock(m);
/* critical section */
mutex_unlock(m);
The function mutex_lock() implements a variation of
Anderson's back-off spin lock. If the mutex "m" is
free, mutex_lock(m) locks the mutex and the thread con-
tinues. If more than one thread attempts to lock the
mutex simultaneously, the Cthreads library guarantees
that only one will succeed; the rest will wait. The
waiting threads are removed from the top of the run
queue and are placed at the end of the appropriate run
queues (the run queues associated with the logical pro-
cessors on which the waiting threads were executing).
When those threads reach the head of their respective
run queues they will repeat the lock attempt. The
mutex_unlock(m) procedure unlocks the mutex m, giving
other threads a chance to lock it.
Operations:
mutex_lock() locks a mutex.
mutex_unlock() unlocks a mutex.
mutex_alloc() allocates a mutex variable.
mutex_free() frees a mutex variable.
mutex_init() initialize a mutex lock.
mutex_clear() finalize a mutex lock.
Condition Variables:
Condition variables make it possible for a thread to
suspend its execution while awaiting an action by some
other thread. In and of itself, a condition variable
has no "value", particularly not in the sense that some
abstract program-related "condition" is true or false.
They are called "condition" variables because in their
most common use the programmer establishes a logical
relationship between a condition variable, a mutex and
some set of program variables which determine a "condi-
tion" that threads might wait for. For example, in
implementing readers-writers style synchronization, a
reader thread must wait until there are no writers in
the system before proceeding. In this case, a variable
"num_writers" might constitute the shared program vari-
able that can be tested to determine whether or not the
condition of no writers being present exists. The
beginning of the reader portion of the code implement-
ing this synchronization might look something like
this:
mutex_t rw_count_lock;
condition_t writers_is_zero;
int num_readers;
int num_writers;
...
mutex_lock(rw_count_lock)
/* wait until num_writers is zero */
while ( num_writers > 0 ) {
condition_wait(writers_is_zero, rw_count_lock);
}
num_readers++;
mutex_unlock(rw_count_lock);
/* now it is safe to read */
While the end of writer portion might look like:
/* code involving writes */
mutex_lock(rw_count_lock);
num_writers--;
if ( num_writers == 0 )
condition_broadcast(writers_is_zero);
mutex_unlock(rw_count_lock);
This is not the complete code for the readers-writers
problem, but just the portion where beginning readers
might sleep and completing writers wake them up. This
is a very common style of using condition variables.
Note that access to the state variables (here
num_readers and num_writers) is shared by multiple
threads and must therefore be protected by a mutex
(here rw_count_lock). That same mutex is specified in
the condition_wait() operation. Condition_wait() actu-
ally unlocks the mutex while this thread is waiting and
relocks it before letting the thread continue. (If it
did not, no other thread could ever gain the mutex to
modify the variables!) Notice too that the
condition_wait() is performed inside a while loop.
This is also common because it is always possible that
the abstract "condition" we're waiting for will change
between the time that the condition variable gets sig-
naled and the time the waiting thread actually gets an
opportunity to run again. In this event the thread
must be prepared to wait again.
One very important thing to remember is that the rela-
tionship between condition variables and abstract "con-
ditions" in a program (like there being no writers
present) is strictly a logical one established by the
programmer. The condition variables themselves just
provide a convenient mechanism for a thread to wait to
be signaled. They assume no values and depend entirely
upon the programmer to write code that determines when
signaling is necessary and when it is really time to
continue, presumably based upon some abstract "condi-
tion".
The operations on a condition variable are
condition_wait() which causes the calling thread to
suspend, and condition_signal() and
condition_broadcast() which can cause waiting threads
to resume. Condition_signal() is used to unblock one
waiting thread; condition_broadcast() is used to
unblock all the waiting threads. Which one is appropri-
ate depends on the circumstances. Using
condition_signal() is preferable when only one blocked
thread can benefit from the change.
Condition_broadcast() is necessary if multiple threads
should resume. (In the readers-writers example,
condition_broadcast() is used because all waiting
readers could start simultaneously.)
The Cthreads library implements a condition variable using
some state variables and a thread queue. A condition_wait
call atomically suspends the calling thread, removes the
calling thread from the run queue, and places it in the
queue associated with the condition variable. A
condition_signal() removes a thread, if there are any, from
the condition queue and places it in the run queue. Note
that a condition_signal() puts the thread in the run queue
but doesn't guarantee that the thread will run immediately
(it may run immediately if the processor run queue is empty
and if it can obtain a lock on the mutex). A
condition_broadcast() removes all the waiting threads from
the condition queue and places them in the appropriate pro-
cessor run queues. However, because the semantics of
condition_wait() require the lock on the mutex to be reesta-
blished by each particular thread before it can run, at most
one one of the threads awakened by a condition_broadcast()
will run immediately. The others will continue as they have
the opportunity to regain the lock.
Operations:
condition_wait() wait on a condition variable for an
event.
condition_signal() wakes up a thread waiting on a con-
ditional variable.
condition_broadcast() wakes up all the threads waiting on
a condition variable.
condition_alloc() allocates a condition variable.
condition_free() free a condition variable.
condition_init() initializes a condition variable.
Thread debugging:
There are a couple of mechanisms available for debugging
Cthreads programs. In particular, if the bug is related to
the use of Cthreads itself (i.e. thread deadlock, missed
synchronization, etc.) then it is possible to turn on the
printing of informational messages associated with almost
every Cthread call. These printouts trace most changes in
thread status. For example these are sample printouts:
Thread "func1" (0xf76f5ac4) attempting lock on mutex "lock" (0xf76f50ec) at test3.c(77)
Thread "func1" (0xf76f5ac4) granted lock on mutex "lock" (0xf76f50ec) at test3.c(77)
Thread "func1" (0xf76f5ac4) waiting on condition "ready" (0xf76f55e0) at test3.c(79)
The printouts contain line and file information that relates
back to the application source. While the library will sup-
ply simple default names for all created conditions, mutexes
and threads, the programmer may need to provide more
descriptive names with cthread_set_name(), mutex_set_name(),
or condition_set_name(). These printouts can be quite a
valuable aid for debugging synchronization problems. See
cthread_parse_args(3) for details on enabling them.
Also, on SPARC targets, it is possible in some circumstances
to use dbx, gdb, ups or other traditional Unix debuggers.
You can do this only if you use just one (1) logical proces-
sor. The Cthreads library manipulates stacks in ways that
most debuggers aren't prepared to handle, so you should be
prepared for some confusion on the part of the debugger.
Usually this isn't much of a problem if you can limit your
debugging efforts to a single thread. It's particularly
easy if all you need to do is find out where, for example, a
segmentation fault is occurring. Just compile with "-g" and
invoke your debugger on the Cthreads program as you would
for any other program.
If all else fails, there's always printf(). Unfortunately,
on some machines, concurrent printf's can trash each other.
To address this, Cthreads provides "atomic" versions of
printf and fprintf. See aprintf(3) for more information.
SEE ALSO
aprintf(3), barrier(3), condition_alloc(3),
condition_free(3), condition_broadcast(3),
condition_signal(3), condition_wait(3),
condition_set_name(3), cthread_configure(3),
cthread_detach(3), cthread_exit(3), cthread_fork(3),
cthread_thread_alloc(3), cthread_thread_schedule(3),
cthread_init(3), cthread_join(3), cthread_parse_args(3),
cthread_perror(3), cthread_publish(3), cthread_self(3),
cthread_set_data(3), cthread_set_name(3),
cthread_set_sched_info(3), cthread_yield(3),
current_processor(3), memory_alloc(3), memory_free(3),
mutex_alloc(3), mutex_free(3), mutex_lock(3),
mutex_unlock(3), mutex_set_name(3), num_of_procs(3)
AUTHOR
Cthreads was written and maintained by many people.
This man page was written by Bodhi Mukherjee and Greg
Eisenhauer.