NAME_______________________________________________

 

EXAM 1 Solutions · CS 3210 Design of Operating Systems
College
of Computing, Georgia Tech · Hutto · Spring 2003

 

Exam 1 – topics

 

·        OS / Kernel Intro (1)

·        Booting / Kernel Init (1)

·        System Calls (1)

·        Memory Addressing (1)

·        Processes (1)

·        Interrupts / Exceptions (2)

·        Kernel Synchronization (1)

·        Timers / Timing (1)

·        Memory Management (1)

 

OS / Kernel Intro

 

1. Explain how the Linux modularized kernel design represents a compromise between monolithic and microkernel design alternatives.

 

Many things you could say here. This assumes you understand what a kernel module is and that your answer demonstrates that. Originally we had big monolithic kernels. Everything went into one big executable that was loaded at boot time and remained resident for the life of the OS execution. Much later came the idea of micro-kernels as a remedy for kernel bloat (as the Linux hackers call it). Take everything out of the kernel except the bare minimum. Run traditional OS kernel services as special privileged system processes. So you would have a filesystem process and a timer service process and a scheduler service, all of which would run as processes outside the kernel. The kernel would contain only the bare minimum mechanism. There are various ideas about what the bare minimum is but usually it includes very simple process creation, context switch, very simple memory management and a low-level messaging (IPC) mechanism. This is good software engineering and a nice policy/mechanism separation. Several real systems are micro-kernel based: Mach, QNX, early NT. Unfortunately, micro-kernels are slower than monolithic kernels (why?). So commercial systems almost always move key components back in the kernel. There is some benefit to this, a microkernel design but a monolithic implementation, but it is not ideal.

 

Kernel modules evolved from device drivers which are separate pieces of kernel code that can be (easily) conditionally included (depending on whether you have the device or not). Apply dynamic linking technology to this idea we get dynamically loadable kernel modules that can be conditionally included or dynamically added and removed. The utility of this seemed obvious and everyone started using kernel modules as a code structuring mechanism. So a system built with kernel modules gives you something like a system that is designed as a microkernel but implemented as a monolithic kernel. You lose the protection of things being in distinct address spaces but also the costs of crossing the kernel/user barrier frequently. Still, there is a lot of Linux that is still “core” so it isn’t exactly a microkernel design but if one were to aggressively modularize it could become so.

 

Booting / Kernel Init

 

2. The start_kernel() entry point, along with many other kernel initialization functions, are “tagged” by a special macro (__init). What does this macro do?

 

Very simple. A detail question. __init is a macro that makes use of special capabilities of ELF (the linking format used by Linux) to load code into different memory areas or segments. Placing this tag on code or data marks it for inclusion in these special “initialization” memory regions and indicates that the code or data is not needed after the kernel is fully booted and initialized. At the very end of initialization the kernel calls free_initmem() from the init thread entry point.

 

System Calls

 

3. Two distinct kernel dispatch tables (tables of function pointers) are involved in executing a system call. Explain the purpose and role of each dispatch table.

 

Pretty simple. You initiate a system call on most architecture (like Intel) by doing something like:

 

            int 80h

 

That is a “software interrupt” that initiates the (hardware) interrupt vectoring mechanism, but from software. The value 80h is the index into the interrupt descriptor table and refers to an interrupt handler. This particular piece of code is the generic “system call interrupt handler”. This table also contains all the device interrupt handlers and exception handlers for the system.

 

In addition to the value 80h, system calls also require that an integer value be placed in a register. This integer value is validated and then used by the generic system call interrupt handler to do an index into another table of function pointers, the “system call table”. The numeric value is the particular system call being requested. The function pointers stored in the second table are the actual system call service routines (like sys_open()).

 

This question doesn’t have anything to do with the common convention in the kernel of splitting system call implementations into two parts: sys_whater(), do_whatever(). That doesn’t involve tables of function pointers and is simply a way that kernel developers distinguish between preliminary checking and the “guts” of a system call.

 

Memory Addressing

 

4.  Explain the relationship between the following elements of the Intel segmentation architecture:

 

  • segment registers
  • segment descriptors
  • segment descriptor tables
  • segment selectors
  • segment descriptor shadow registers

 

I used to find this very confusing but it is pretty simple after you read the chapter 8 times J

 

In the Intel architecture all (logical) addresses are represented by a segment and an offset into that segment. The associations are often implicit. For example, a value in the stack pointer register (esp) is implicitly an offset in the currently defined stack segment. A jump location is implicitly an offset in the currently defined code segment and so forth.

 

A segment register (like cs, ds, ss, es, fs, gs) contains a segment selector which is conceptually a bit indicating a descriptor table (local or global) and an index into that table.

 

Segment descriptor tables come in various flavors and contain 8 byte entries called segment descriptors that tell you everything you need to know about a segment. Descriptors fundamentally include the linear or physical starting address of the segment (depending on whether paging is enabled) and the length of the segment  (base + bounds). In addition they contain a variety of flags and privilege information. Linux utilizes local descriptor tables only for DOS emulation.

 

The eight thousand element global descriptor table (GDT) contains 12 statically allocated segments used extensively by the kernel. The remaining 8000 or so entries are reserved for pairs of process descriptors. Each process is allocated (dynamically) two descriptors: a TASK SEGMENT DESCRIPTOR (where it store low-level context information) and a LOCAL SEGMENT DESCRIPTOR (if it needs DOS emulation).

 

Accessing descriptor tables (stored in RAM) for each instruction execution would be prohibitively expensive so the Intel architecture includes special hidden, on-chip shadow registers. When a new selector is written into a segment register, the system automatically loads the corresponding descriptor into a shadow register. Access can now proceed at CPU register speeds.

 

Processes

 

5. What is the difference between a user and kernel thread? What are the three possible threading models relating user and kernel threads? Which one does Linux use?

           

User threads are lightweight entities, not known to the kernel for scheduling purposes. They are created and managed by user-level libraries that do their own thread scheduling and context switching.  Systems with no kernel thread support can run user-level threads. The problem with user-level threads is that they cannot provide access to physical parallelism. If you have an 8-way SMP system and are running 8 user-level threads in a single process, only one will run at a time. (You still get some benefit from i/o parallelism.) You can have tens or hundreds of thousands of user-level threads.

 

Kernel threads provide user-level thread functionality within the kernel. They are useful for structuring the kernel code itself and can export physical parallelism to user-level processes. They tend to be more heavy-weight (more data structures, higher context switch time, etc.) You can have hundreds or thousands or kernel-level threads.

 

What is the relationship between user and kernel threads? There are three models:

 

  • N-to-1: all threads in a process are assigned a single kernel thread (basically no kernel threads)
  • to-1: each user thread is assigned a separate kernel thread (expensive but easy to do)
  • M-to-N (M>=N): several user threads are multiplexed on a single kernel thread; each process is assigned several kernel threads; the number may change dynamically as demand increases or decreases

 

Linux currently uses the 1-to-1 model but IBM has provided a “Next Generation” threads package that provides the M-to-N model on Linux. Sun pioneered the M-to-N model but has surprisingly reverted to the 1-to-1 model in Solaris 9. They claim the benefit of the M-to-N model is outweighed by the complexity of the implementation.

 

Interrupts

 

6. Describe the role of the PIC in interrupt processing in Intel-based Linux systems? What additional capabilities does the APIC offer?

 

The PIC is a chip that sits in between devices and the CPU conceptually. It manages IRQs, allows sharing, provides ACKS to devices and receives them from the CPU, maps IRQs to vectors (programmatically), allows interrupts to be temporarily blocked or masked, and holds pending interrupts. The original PC architecture had a single chip with 8 inputs. This was soon increased to a cascade of 2 PICs allowing 15 inputs.

 

An APIC is a modern version of the PIC, originally designed for SMP systems but now included on almost all new Intel machines. The APIC is split between a front-end APIC and local APICS, one per cpu. The front-end is an interrupt router and directs interrupts to appropriate processors over a special APIC bus. Various load-balancing schemes are available but Linux uses simple round-robin. The APIC is much more sophisticated than the PIC and supports interrupt broadcasts, inter processor interrupts (IPI), and includes a clock and timer. The APIC allows an increased number of inputs. The APIC timer is used in Linux to do per-cpu timer-based processing like quantum expiration.

 

Exceptions

 

7. Under what circumstances can the execution of interrupts and exceptions be nested?

 

Interrupts can, generally, be interrupted by other interrupts. The interrupt being serviced is masked during the execution of the handler. This bounds the number of interrupt control paths that can be stacked. Interrupts may always briefly disable interrupts and a very few run entirely with interrupts disabled (like the timer interrupt). Note that this discussion refers to interrupt top-halves. Bottom-halves are always interruptible (but not re-entrant).

 

Interrupts can theoretically be intervened by an exception handler (it is easy to create this situation, just write your own interrupt handler that makes an invalid memory reference) but developers assume that kernel code (including interrupt handlers) is always correct. So this is SHOULD-NOT-HAPPEN situation.

 

Interrupts may occur during the execution of exception handlers.

 

Exceptions should not be caused by exception handlers with one, well, exception. A page fault may occur when an exception handler is trying to touch user space. This gives rise to a maximum two-level nesting of exception handlers.

 

Kernel Synchronization

 

8. Inserting into a linked list basically involves two (atomic) assignment statements like this:

 

            new->next = list_element->next;

list_element->next = new;

 

Your book argues that making this work properly on modern architectures requires an intervening write memory barrier as below. Explain.

 

            new->next = list_element->next;

     wmb(); // what does this do? why is it needed?

list_element->next = new;

 

See pages 184-185 (Chapter 5 Kernel Synchronization) for details.

 

Simply a memory barrier ensures that all memory accesses before the barrier are completed before any memory accesses after the barrier initiate. A read barrier makes this claim for read accesses, a write barrier for write accesses, a generic barrier for both read and write accesses.

 

Why do we care about this? Modern compilers and architectures reorder accesses to enhance performance. This works fine on a uniprocessor but you can run afoul of the optimizations when you are doing fine-grained shared memory access (i.e. synchronization) on an SMP system.

 

All standard locking primitives are effectively memory barriers but is sometimes possible to interleave parallel code without requiring locking (as above). While locking might not be required in such situations (logically), memory barriers may still be required.

 

Imagine two cpu’s concurrently executing the above instructions. If the instruction order is swapped, things get ugly.

 

Just as an aside, wmb() is a no-op on the Intel because it doesn’t every reorder write operations, only reads. The read memory  barrier is implemented on Intel as a no-op add with a lock prefix (instructions with a lock prefix are implicitly memory barriers on Intel).

 

Timers and Timing

 

9. How is the TSC used to improve the accuracy of gettimeofday() on systems that support a TSC? What is the difference between calling gettimeofday() and executing the rdtsc instruction?

 

TSC (Time Stamp Counter) is a very high granularity (cpu clock speed) counter. If you have a 3 GHz system, it increments 3 billion times a second! The kernel variable jiffies is updated by the timer interrupt 100 or 1000 times a second. The corresponding variable xtime is updated at the same rate or slightly less frequently if several top-halves occur before the timer bottom half is executed.

 

gettimeofday() simply looks at xtime and jiffies. To increase accuracy, it also determines the number of cycles that have occurred since the last update of jiffies and xtime and adds in that value.

 

gettimeofday() is a system call and requires significant cost to enter and exit the kernel. rdtsc (“read tsc”) is an instruction that can be called from user-mode. It samples the tsc at memory access speeds!

 

Memory Management

 

10. Briefly describe the Buddy System algorithm and how it is used to implement page allocation in the Linux kernel.

 

The Buddy System is a “powers-of-two” allocator. It maintains 10 separate “free lists” of contiguous pages of free memory, 1, 2, 4, 8, 16, etc. Originally all free memory is represented by one or more large chunks (free memory is discontiguous initially). When a request comes in, if no element of the requested size is available, it searches through increasingly larger sizes until a free chunk is found. This chunk is then split up as necessary and spread across the intervening (empty) free lists and the requested piece is allocated. When a piece is freed, if it’s “buddy” is free (contiguous and properly aligned) then the two are merged into one larger piece and moved to the next larger list. Merging and splitting may cascade. The overall structure reduces fragmentation.