From: davidhi@cc.gatech.edu (D. Hilley) Subject: Notes on Project #1 Newsgroups: git.cc.class.cs6210 Date: Tue, 23 Aug 2005 23:32:08 -0400 Organization: Georgia Institute of Technology I wrote up some implementation notes/hints in order to help explain some of the trickier parts of your first project. Obviously you don't have to follow my advice at all as long as you meet the project requirements. Notes on Project 1 (GT Threads) =============================== Your assignment is to develop a user-level, preemptive threads library in C. Your thread library will support a subset of POSIX threads (pthreads) and follow a similar API and have the same semantics. Conceptually, you are multiplexing multiple threads of execution over a single Unix process. This also means you WILL NOT be using calls like fork or clone or other OS facilities for process creation. Since you only have one schedulable process, your thread library will not support "true" concurrency in the sense that multiple threads will not be able to execute simultaneously on SMP systems. In addition, blocking I/O calls will block the entire process and therefore all threads. This is okay for our purposes, though. You must implement the functions listed on the assignment page and you also obviously need to define the opaque types gtthread_t and gtthread_mutex_t. You will need to keep track of each thread's execution context and other bookkeeping information. Remember that each thread must have its own unique stack (or else the threads will stomp on each others' data). At thread creation, you will create a new thread descriptor and stack. Thread Switching ---------------- If you are not familiar with setjmp and longjmp (and their signal-mask preserving counterparts sigsetjmp and siglongjmp), please take a look at the man page. Setjmp and longjmp (or the ucontext calls) are a bit strange to wrap your head around at first if you haven't seen them before. You will use these calls to switch execution contexts between threads every time the scheduler is invoked (which will happen at regular intervals via the timer interrupts). Since setjmp and longjmp essentially allow non-local gotos, you can experience odd behavior if you longjmp to a function that has already returned. This is noted in the man page: "The stack context will be invalidated if the function which called setjmp() returns." Example: Function A calls Function B. Function B calls setjmp. Function B returns (now function B's stack frame is gone). Function A calls Function C (which creates a new stack frame on the stack in the free space that was previous occupied by Function B). Function C returns. Function A calls longjmp using the context that was saved earlier. We are back in Function B again as if we just called setjmp, but all of the local (automatic) variables are corrupted. The only other 'trick' needed to switch threads with longjmp is changing stacks. When setjmp is called, it captures the current stack pointer in the opaque jmp_buf. As long as the thread creation mechanism can fix up the stack initially saved with the thread's execution context, you don't ever have to worry about stacks again. In the main scheduling loop, you will simply longjmp to the new thread descriptor and the stack should already be set up by the thread creation mechanism. Thread Creation --------------- If you are using the ucontext set of calls, thread creation is fairly straightforward. In the setjmp/longjmp case, it's also straightforward except for fixing the new thread's stack. In the thread creation function, you will save the context at one point. At this point it branches into two possible lines of execution (the original thread's and the new thread's). At some point it will return to the point of that call that saved context and continue as the original thread that originally called create. In the other case, you will simply switch stacks and call the new thread's starting function (or with the ucontext set of calls, you can specify the starting point). When will it return to the original thread? When it is scheduled again. Okay great, so how do you switch the stack at thread creation? There are several techniques available for switching thread stacks, the least portable of which involves manually changing the values inside the opaque types jmp_buf or sigjmp_buf. This is both OS and hardware dependent (in other words, it's even different between x86 Linux and PowerPC Linux), but just look in your OS's header files. The next technique involves sigaltstack, which is a POSIX function designed to allow a signal handler to execute on a user-allocated stack. Using a technique called 'signal trampolines,' we install a dummy signal handler for SIGUSR1 or some other user-defined signal and set it to execute on a separate stack using sigaltstack. Next, we intentionally raise the signal and cause the handler to execute. Then we can setjmp inside the signal handler, return, and later longjmp back to execute the starting function of the thread. This approach may require some manual twiddling with signal masks to ensure that the user generated signal is unblocked after thread creation is done. Finally, we can use getcontext/setcontext to switch the stacks, even if we are using longjmp for thread switching. Simply save the current context, change the stack, and setcontext. Remember after you do that you must not use any local variables or let the current function return (because the frame on the new stack will not have the right information). In any of the above cases, after you switch stacks, you must not allow the currently executing function to return. In all cases, you can simply call a dummy function to create a clean stack frame. From there, you can call the thread entry point function. Exclusion Issues ---------------- Since there is no true concurrency in your code, you don't have to worry about all of the standard gotchas of concurrent programming. The only thing you have to worry about is asynchronous signals (the timer). Obviously you must protect your thread library code so that you don't recieve a signal and invoke the scheduler or switch threads while your bookkeeping information is inconsistent. The most straightforward way to do this is to block the delivery of the timer signal during critical sections. Extras ------ Some thread libraries try to protect their thread stacks from overflow or underflow issues using different techniques. One simple way is to allocate two additional pages for a thread stack and use mprotect to make the top and bottom pages of the stack unreadable/unwritable. If any process attempts to touch the pages, it will generate a segmentation fault (which can be handled appropriately). These protected pages are called 'Red Zones' and can be enabled in Solaris's user-level threads library. Another technique is called 'fence-post protection,' which involves writing a small bit of special data (like 0xF04A3271) before and after the usable extent of the stack. This value can be checked at every context switch (and every thread library call for extra granularity). The downside of this approach is that you can only detect overflows or overruns when it's too late, out-of-bound reads are not detected and there is a tiny possibility the program may overwrite the fence-post with the same exact value. You don't have to implement either of these techniques, however. -- D. Hilley (davidhi@cc.gatech.edu)