NAME_______________________________________________
Exam
1 – topics
·
OS
/ Kernel Intro (1)
·
Booting
/ Kernel Init (1)
·
System
Calls (1)
·
Memory
Addressing (1)
·
Processes
(1)
·
Interrupts
/ Exceptions (2)
·
Kernel
Synchronization (1)
·
Timers
/ Timing (1)
·
Memory
Management (1)
OS / Kernel Intro
1. Explain how the Linux modularized
kernel design represents a compromise between monolithic and microkernel
design alternatives.
Many things you could say here. This assumes
you understand what a kernel module is and that your answer demonstrates that.
Originally we had big monolithic kernels. Everything went into one big
executable that was loaded at boot time and remained resident for the life of
the OS execution. Much later came the idea of
micro-kernels as a remedy for kernel bloat (as the Linux hackers call it). Take
everything out of the kernel except the bare minimum. Run traditional OS kernel
services as special privileged system processes. So you would have a filesystem
process and a timer service process and a scheduler service, all of which would
run as processes outside the kernel. The kernel would contain only the bare
minimum mechanism. There are various ideas about what the bare minimum is but
usually it includes very simple process creation, context switch, very simple
memory management and a low-level messaging (IPC) mechanism. This is good
software engineering and a nice policy/mechanism separation. Several real
systems are micro-kernel based: Mach, QNX, early NT.
Unfortunately, micro-kernels are slower than monolithic kernels (why?). So commercial systems almost always move key components back in the
kernel. There is some benefit to this, a microkernel design but a
monolithic implementation, but it is not ideal.
Kernel modules evolved from device drivers
which are separate pieces of kernel code that can be (easily) conditionally
included (depending on whether you have the device or not). Apply dynamic
linking technology to this idea we get dynamically loadable kernel modules that
can be conditionally included or dynamically added and removed. The utility of
this seemed obvious and everyone started using kernel modules as a code
structuring mechanism. So a system built with kernel modules gives you
something like a system that is designed as a microkernel but implemented as a
monolithic kernel. You lose the protection of things being in distinct address
spaces but also the costs of crossing the kernel/user barrier frequently.
Still, there is a lot of Linux that is still “core” so it isn’t exactly a
microkernel design but if one were to aggressively modularize it could become
so.
Booting / Kernel Init
2. The start_kernel() entry point, along with many other kernel initialization functions, are “tagged” by a special macro (__init). What does this macro do?
Very simple. A detail question.
__init is a macro that makes use of special capabilities of ELF (the linking
format used by Linux) to load code into different memory areas or segments.
Placing this tag on code or data marks it for inclusion in these special
“initialization” memory regions and indicates that the code or data is not
needed after the kernel is fully booted and initialized. At the very end of
initialization the kernel calls free_initmem() from the init thread entry point.
System Calls
3. Two
distinct kernel dispatch tables
(tables of function pointers) are involved in executing a system call. Explain
the purpose and role of each dispatch table.
Pretty simple. You initiate a system call on most architecture (like Intel) by doing something like:
int 80h
That is a “software interrupt” that initiates the (hardware) interrupt vectoring mechanism, but from software. The value 80h is the index into the interrupt descriptor table and refers to an interrupt handler. This particular piece of code is the generic “system call interrupt handler”. This table also contains all the device interrupt handlers and exception handlers for the system.
In addition to the value 80h, system calls also require that an integer value be placed in a register. This integer value is validated and then used by the generic system call interrupt handler to do an index into another table of function pointers, the “system call table”. The numeric value is the particular system call being requested. The function pointers stored in the second table are the actual system call service routines (like sys_open()).
This question doesn’t have anything to do with the common convention
in the kernel of splitting system call implementations into two parts: sys_whater(),
do_whatever(). That doesn’t involve tables of
function pointers and is simply a way that kernel developers distinguish
between preliminary checking and the “guts” of a system call.
Memory
Addressing
4. Explain the relationship between the following elements of the Intel segmentation architecture:
I used to find this very confusing but it is pretty simple after you read the chapter 8 times J
In the Intel architecture all (logical) addresses are represented by a segment and an offset into that segment. The associations are often implicit. For example, a value in the stack pointer register (esp) is implicitly an offset in the currently defined stack segment. A jump location is implicitly an offset in the currently defined code segment and so forth.
A segment register (like cs, ds, ss, es, fs, gs) contains a segment selector which is conceptually a bit indicating a descriptor table (local or global) and an index into that table.
Segment descriptor tables come in various flavors and contain 8 byte entries called segment descriptors that tell you everything you need to know about a segment. Descriptors fundamentally include the linear or physical starting address of the segment (depending on whether paging is enabled) and the length of the segment (base + bounds). In addition they contain a variety of flags and privilege information. Linux utilizes local descriptor tables only for DOS emulation.
The eight thousand element global descriptor table (GDT) contains 12 statically allocated segments used extensively by the kernel. The remaining 8000 or so entries are reserved for pairs of process descriptors. Each process is allocated (dynamically) two descriptors: a TASK SEGMENT DESCRIPTOR (where it store low-level context information) and a LOCAL SEGMENT DESCRIPTOR (if it needs DOS emulation).
Accessing descriptor tables (stored in RAM) for each
instruction execution would be prohibitively expensive so the Intel
architecture includes special hidden, on-chip shadow registers. When a
new selector is written into a segment register, the system automatically loads
the corresponding descriptor into a shadow register. Access can now proceed at
CPU register speeds.
Processes
5. What is the difference between a user and kernel thread? What are the three possible threading
models relating user and kernel threads? Which one does Linux use?
User threads are
lightweight entities, not known to the kernel for scheduling purposes. They are
created and managed by user-level libraries that do their own thread scheduling
and context switching. Systems with no
kernel thread support can run user-level threads. The problem with user-level
threads is that they cannot provide access to physical parallelism. If you have
an 8-way SMP system and are running 8 user-level threads in a single process,
only one will run at a time. (You still get some benefit from i/o parallelism.) You can have tens or hundreds of thousands
of user-level threads.
Kernel threads
provide user-level thread functionality within the kernel. They are useful for
structuring the kernel code itself and can export physical parallelism to
user-level processes. They tend to be more heavy-weight (more data structures,
higher context switch time, etc.) You can have hundreds or thousands or
kernel-level threads.
What is the
relationship between user and kernel threads? There are three models:
Linux currently uses
the 1-to-1 model but IBM has provided a “Next Generation” threads package that provides
the M-to-N model on Linux. Sun pioneered the M-to-N model but has surprisingly
reverted to the 1-to-1 model in Solaris 9. They claim the benefit of the M-to-N
model is outweighed by the complexity of the implementation.
Interrupts
6. Describe the role of the PIC in interrupt processing in Intel-based Linux systems? What additional capabilities does the APIC offer?
The PIC is a chip
that sits in between devices and the CPU conceptually. It manages IRQs, allows sharing, provides ACKS to devices and receives
them from the CPU, maps IRQs to vectors
(programmatically), allows interrupts to be temporarily blocked or masked, and
holds pending interrupts. The original PC architecture had a single chip with 8
inputs. This was soon increased to a cascade of 2 PICs
allowing 15 inputs.
An APIC is a modern
version of the PIC, originally designed for SMP systems but now included on
almost all new Intel machines. The APIC is split between a front-end APIC and
local APICS, one per cpu. The front-end is an interrupt
router and directs interrupts to appropriate processors over a special APIC
bus. Various load-balancing schemes are available but Linux uses simple
round-robin. The APIC is much more sophisticated than the PIC and supports
interrupt broadcasts, inter processor interrupts (IPI), and includes a clock
and timer. The APIC allows an increased number of inputs. The APIC timer is
used in Linux to do per-cpu timer-based processing like quantum expiration.
Exceptions
7. Under what circumstances can the execution of interrupts and exceptions be nested?
Interrupts can,
generally, be interrupted by other interrupts. The interrupt being
serviced is masked during the execution of the handler. This bounds the number
of interrupt control paths that can be stacked. Interrupts may always briefly
disable interrupts and a very few run entirely with interrupts disabled (like
the timer interrupt). Note that this discussion refers to interrupt top-halves.
Bottom-halves are always interruptible (but not re-entrant).
Interrupts can theoretically
be intervened by an exception handler (it is easy to create this situation,
just write your own interrupt handler that makes an invalid memory reference)
but developers assume that kernel code (including interrupt handlers) is always
correct. So this is SHOULD-NOT-HAPPEN situation.
Interrupts may occur
during the execution of exception handlers.
Exceptions should
not be caused by exception handlers with one, well, exception. A page fault may
occur when an exception handler is trying to touch user space. This gives rise
to a maximum two-level nesting of exception handlers.
Kernel
Synchronization
8. Inserting into a linked list basically involves two (atomic) assignment statements
like this:
new->next = list_element->next;
list_element->next = new;
Your book argues that making this work
properly on modern architectures requires an intervening write memory barrier as below. Explain.
new->next = list_element->next;
wmb(); // what does this do? why is it needed?
list_element->next = new;
See pages 184-185
(Chapter 5 Kernel Synchronization) for details.
Simply a memory
barrier ensures that all memory accesses before the barrier are completed
before any memory accesses after the barrier initiate. A read barrier makes
this claim for read accesses, a write barrier for write accesses, a generic barrier for both read and write accesses.
Why do we care about
this? Modern compilers and architectures reorder accesses to enhance
performance. This works fine on a uniprocessor but you can run afoul of the
optimizations when you are doing fine-grained shared memory access (i.e.
synchronization) on an SMP system.
All standard locking
primitives are effectively memory barriers but is
sometimes possible to interleave parallel code without requiring locking (as
above). While locking might not be required in such situations (logically),
memory barriers may still be required.
Imagine two cpu’s concurrently executing the
above instructions. If the instruction order is swapped, things get ugly.
Just as an aside, wmb() is
a no-op on the Intel because it doesn’t every reorder write operations, only
reads. The read memory
barrier is implemented on Intel as a no-op add with a lock prefix
(instructions with a lock prefix are implicitly memory barriers on Intel).
Timers
and Timing
9. How is the TSC used to improve the accuracy of gettimeofday() on
systems that support a TSC? What is the difference
between calling gettimeofday()
and executing the rdtsc instruction?
TSC (Time Stamp
Counter) is a very high granularity (cpu clock speed)
counter. If you have a 3 GHz system, it increments 3 billion times a second!
The kernel variable jiffies is updated by the timer
interrupt 100 or 1000 times a second. The corresponding variable xtime is
updated at the same rate or slightly less frequently if several top-halves
occur before the timer bottom half is executed.
gettimeofday() simply looks at xtime and jiffies. To
increase accuracy, it also determines the number of cycles that have occurred
since the last update of jiffies and xtime and adds in that value.
gettimeofday() is a system call and requires significant
cost to enter and exit the kernel. rdtsc (“read tsc”) is an instruction that can be called from user-mode.
It samples the tsc at memory access speeds!
Memory
Management
10. Briefly describe
the Buddy System algorithm and how
it is used to implement page allocation
in the Linux kernel.
The Buddy System is
a “powers-of-two” allocator. It maintains 10 separate “free lists” of
contiguous pages of free memory, 1, 2, 4, 8, 16, etc. Originally all free
memory is represented by one or more large chunks (free memory is discontiguous initially). When a request comes in, if no
element of the requested size is available, it searches through increasingly
larger sizes until a free chunk is found. This chunk is then split up as
necessary and spread across the intervening (empty) free lists and the
requested piece is allocated. When a piece is freed, if it’s “buddy” is free
(contiguous and properly aligned) then the two are merged into one larger piece
and moved to the next larger list. Merging and splitting may cascade. The
overall structure reduces fragmentation.