CS 2200 Exam Study Guide Listed below are questions that will help you prepare for the final exam. THERE IS NO GUARANTEE THAT THIS LIST IS ANY KIND OF INDICATOR OF EXACTLY WHAT WILL BE ON THE FINAL. In other words, there might be different problems, more problems requiring calculations, etc. Also, if you see a question that covers something that you have never seen don't worry about it (Assuming that you have been going to class.) ---------------------------------------------------------------------- ---------------------------------------------------------------------- Q: What are the actions taken by the hardware upon an interrupt in a pipelined processor? Q: Suggest a scheme to control thrashing using the instantaneous page-fault rate observed by the operating system. Q: (Answer True/False with justification) In an architecture that supports virtual memory, it always takes at least 2 trips to the physical memory for every memory access generated by the processor. Q: (Answer True/False with justification) For a given workload and a given instruction-set architecture, reducing the CPI (clocks per instruction) of all the instructions will always improve the performance of the processor. Q: (Answer True/False with justification) Having a large register- file is detrimental to the performance of a processor since it results in a large overhead for procedure call/return in high- level languages. Q: An architecture has three types of instructions that have the following CPI: Type CPI A 2 B 5 C 1 An architect determines that he can reduce the CPI for B to 3, with no change to the CPIs of the other two instruction types, but with an increase in the clock speed of the processor. What is the maximum permissible increase in clock speed that will make this architectural change still worthwhile? Assume that all the workloads that execute on this processor use 30% of A, 10% of B, and 60% of C types of instructions. Q: A memory hierarchy has the following resources: L1 cache 2 ns access time 98% hit rate L2 cache 10 ns access time ?? Memory 60 ns access time Assume that for each of L1 and L2 caches, a lookup is necessary to determine whether a reference is a hit or miss. What should be the hit rate of L2 cache to ensure that the effective memory access time (P&H book also calls this the Average memory access time) is no more than 3 ns. Q: Consider a pre-emptive priority processor scheduler. There are three processes P1, P2, and P3 in the job mix that have the following characteristics: Process Arrival Priority Activity Time ------- ------- -------- -------------------------- P1 0 sec 1 8 sec CPU burst followed by 4 sec I/O burst followed by 6 sec CPU burst and quit P2 2 sec 3 64 sec CPU burst and quit P3 4 sec 2 2 sec CPU burst followed by 2 sec I/O burst followed by 2 sec CPU burst followed by 2 sec I/O burst followed by 2 sec CPU burst followed by 2 sec I/O burst followed by 2 sec CPU burst and quit What is the turnaround time for each of P1, P2, and P3? What is the average waiting time for this job mix? Q: Consider an architecture wherein for each entry in the TLB there is: 1 reference bit (that is set by hardware when the associated TLB entry is referenced by the CPU for address translation) 1 dirty bit (that is set by hardware when the associated TLB entry is referenced by the CPU for a store access). These bits are in addition to the other fields of the TLB that we have discussed in the class. The architecture provides three special instructions: one for sampling the reference bit for a particular TLB entry (Sample_TLB(entry_num)); one for clearing the reference bit for a particular TLB entry (Clear_refbit_TLB(entry_num)); and one for clearing the reference bits in all the TLB entries (Clear_all_refbits_TLB()). Come up with a scheme to implement page replacement using this additional help from the TLB. For full credit, you should show data structures and pseudo code that the algorithm maintains to implement page replacement. Q: (Answer True/False with justification for the item in bold letters) In the context of facilitating procedure call/return, a stack is a hardware abstraction. Q: Distinguish between frame pointer and stack pointer. Q: Give one sentence summary of the functionality for each of the following pieces of system programs: compiler, assembler, linker, and loader. Q: (Answer True/False with justification) In one clock cycle it is possible to compute B based on the current value of A. +------+ Constant | \ 1 --------------------+ \ | \ + \ \ ) \ + \ | > ALU +----------- B / | +---+ / + | R | / ) | E | + / | G | | / A ------+ I +------------------+ / | S | | / | T | +------+ | E | | R | +---+ Q: What are the important deficiencies of FCFS CPU scheduling? Q: Explain clearly with pictures the actions that are taken by the hardware and software respectively to handle an interrupt. Q: You are given the pipelined datapath for a processor as shown below. IR Brr Bm Brw +--------+ +--------+ +--------+ +--------+ |Pipeline| |Pipeline| |Pipeline| |Pipeline| IF->| 1 |->ID->| 2 |->EX->| 3 |->MEM->| 4 |->WB |Register| |Register| |Register| |Register| +--------+ +--------+ +--------+ +--------+ ^ ^ | | Reg ALU File here Here A loadword instruction has the following 32-bit format: OPCode A Reg B Reg Offset 8 bits 4 bits 4 bits 16 bits The semantic of this instruction is B <- Memory[A + offset] Register B gets the data at the memory address given by the effective address generated by adding the contents of Register A to the 16-bit offset in the instruction. All data and addresses are 32-bit quantities. Show what is needed in the pipeline registers (IR, Brr, Bm, and Brw) between the stages of the pipeline for the passage of this instruction. Clearly show the layout of each buffer. Show the width of each field in the buffer. You do not have to concern yourself about the format or the requirements of other instructions in the architecture for this problem. IR +-------------------------------------------------------------+ | | +-------------------------------------------------------------+ Brr +-------------------------------------------------------------+ | | +-------------------------------------------------------------+ Bm +-------------------------------------------------------------+ | | +-------------------------------------------------------------+ Brw +-------------------------------------------------------------+ | | +-------------------------------------------------------------+ Q: Mark the 5 classic components of a microprocessor per Patterson & Hennessy ___ Bus ___ Input ___ Interrupts ___ Benchmarks ___ Clock ___ Processor ___ Datapath ___ Power ___ Control ___ Icons ___ Memory ___ Cache ___ Output ___ OS Q: In the Mips and LC series processors where are operands normally found? Q: Module A has important data in both S and T registers and is about to call Module B. Which registers should it store on the stack? ___: S ___: S and T ___: T ___: Neither Q: Is it allowable to modify data stored in the interior of a stack? In other words are all accesses to data in a stack made via pushes and pops? Discuss. Q: Is a frame pointer necessary if procedure calls will be used? ___: Yes ___: No ___: Yes, but only if non-leaf procedures will be called. ___: Yes, but only if leaf-procedures will be called. Q: What would be the execution time for a program containing 2,000,000 instructions if the processor clock was running at 8 MHz and each instruction takes 4 clock cycles? Q: What are the advantages and disadvantages of a bus-based datapath? Q: What are the two basic approaches to control logic. ___: ROM ___: Microcode ___: Superscaler ___: Combinational logic ___: Finite state machine Q: Put these in order: A. Memory Ops B. Instruction Fetch C. Write back D. Execution E. Instruction Decode Q: Would you consider the stack which is used for facilitating procedure call/return a: _____ Hardware abstraction _____ Software abstraction _____ Pass by name calling convention _____ Virtual machine _____ Queue Q: Consider the compiler, assembler, linker and loader. Which resolves external references? Which one isn't necessary (assuming that a program is actually executed)? Which has an output that is relatively readable by men or women? Q: For this datapath (assuming that all lines are 16 bits wide) Fill in the table below. CLK | +---+ | R | | E | +------+ | G | C | \ A ------+ I +------------+ \ | S | | \ | T | + \ +---+ | E | \ ) | R | | R | \ + | E | +---+ \ | E | G | > ALU +-------+ I +---- F / | | S | +---+ / + | T | + R + / ) | E | | E | + / | R | | G | D | / +---+ B ------+ I +------------+ / | | S | | / | | T | +---+--+ CLK | E | | | R | | +---+ AND | CLK 1 2 3 4 5 6 | | | | | | V V V V V V ______ ______ ______ ______ ______ ______ | | | | | | | | | | | | | | | | | | | | | | | | ____| |____| |____| |____| |____| |____| |___ Time A B C D E F -------- ------- ------- ------- ------- ------- ------- 1 0x42 0xFE 0 0 0 0 2 0 0 _______ _______ _______ _______ 3 0xCAFE 0x1 _______ _______ _______ _______ 4 0 0 _______ _______ _______ _______ 5 0 0 _______ _______ _______ _______ 6 0 0 _______ _______ _______ _______ Q: Of the following, what is typically done before the others upon interrupt: _____ a.) Branch to interrupt handler _____ b.) Enable interrupts _____ c.) Mask frame pointer _____ d.) JALR int 13 _____ e.) Vector DL44 ATL-FTW _____ f.) Receive interrupt vector Q: You are designing a cache for a 32-bit processor. Memory is organized into words but byte addressing is used. You have been told to make the cache 2-way set associative with 64 words (256 bytes) per block. You are allowed to use a total of 64K words (256K bytes) for the data (excluding tags, status bits, etc.) Sketch the layout of the cache here (note: no control logic should be shown): [We are interested in the number and size of things. Show how a memory reference will be subdivided: Byte offset, block offset, index(set), tag. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_| 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 Q: Extend the LC-99 assembly language to include a subtract operation. Give some pseudocode for the actions that must taken in each state. Q: Is it necessary for the LC series processor instruction set to have a subtract operation, or can subtraction be implemented using existing LC-99 instructions? Q: What is spillage? Q: What do you think about the speed of memory versus registers? Q: How many bytes of memory will this code snippet require when assembled? beq $s3, $s4, L1 add $s0, $s1, $s2 L1: sub $s0, $s0, $s3 Q: What is the advantage of fixed length instructions Q: What does "SETI" stand for? Q: What is a leaf procedure? Q: Suppose we have the following: Address 0xFF001000 Loop: NOP # Beginning of loop ... 0xFF002000 j Loop # Jump to beginning of loop The jump instruction op code is 6 bits: 000010 What will be in the remaining 26 bits? Q: What is the execution time for 5,000,000 instructions where each instruction takes 5 clock cycles and the clock runs at 1 GHz. Q: Suppose you are writing an LC-99 program and you want to jump a distance that is farther that allowed by a BEQ instruction. Is there a way to jump to an address? a. The only way would be to use a series of BEQ instructions b. You could use another instruction c. No reasonable way exists d. It could be simulated using the LW instruction Q: Could the LC series processors be made to run faster if it had a third bus? If your answer was no, what else would you need? Another ALU An additional mux A TPRF Q: Given what we said about why registers are used in processors does anything seem odd about the design of the LC series processors? Q: What are the four goals in processor design (that may conflict with each other)? Q: Suppose we want to convert this statement: g = h + A[i]; into MIPS assembler and suppose that the Address of A is located in $s3 g is in $s1 h is in $s2 and i is in $s4 Which code segment is correct? a add $t1, $t1, $s3 lw $t0, 0($t1) add $s1, $s2, $t0 b add $t1, $s4, $s4 add $t1, $t1, $t1 add $t1, $t1, $s3 lw $t0, 0($t1) add $s1, $s2, $t0 c add $t1, $s4, $s4 add $t1, $t1, $t1 add $t1, $t1, $s3 lw $t0, $s4($t1) add $s1, $s2, $t0 d add $t1, $s4, $s4 add $t1, $t1, $t1 add $t1, $t1, $s3 lw $t0, 4*($t1) add $s1, $s2, $t0 Q: Suppose you design a computer called the Big Looper 2000 that will never be used to call procedures and that will automatically jump back to the beginning of memory when it reaches the end. Do you need a program counter? ___: Yes. ___: No. ___: Yes but only if a JR instruction will be implemented Q: Is a frame pointer necessary if procedure calls will be used? ___: Yes ___: No ___: Yes, but only if non- leaf procedures will be called. ___: Yes, but only if leaf- procedures will be called. Q: Name at least four important functional units in the LC-99 1._________________________ 2._________________________ 3._________________________ 4._________________________ Q: What should be saved upon encountering an exception? Q: On the left is the memory layout for a Big-Endian Computer. On the right Little Endian. Each square is a byte. Four bytes make up a word. Byte addresses increment by 1. Word addresses increment by 4. Label the diagrams clearly indicating word and byte addresses. Work from top to bottom. Start addressing at 100 Big Little ----- ----- ----- ----- ----- ----- ----- ----- | | | | | | | | | | | | | | | | | | | | ----- ----- ----- ----- ----- ----- ----- ----- | | | | | | | | | | | | | | | | | | | | ----- ----- ----- ----- ----- ----- ----- ----- | | | | | | | | | | | | | | | | | | | | ----- ----- ----- ----- ----- ----- ----- ----- | | | | | | | | | | | | | | | | | | | | ----- ----- ----- ----- ----- ----- ----- ----- | | | | | | | | | | | | | | | | | | | | ----- ----- ----- ----- ----- ----- ----- ----- Q: In a pipelined processor where each instruction could be broken up into 5 stages and where each stage takes 1 ns what is the best we could hope to do in terms of average time to execute 1,000,000,000 instructions? Q: Name three types of hazards we must be concerned about in designing a pipelined processor. Give an example of each. Q: In the LC series of processors, why is there a register after the ALU? Q: A smart architect re-implements a given instruction-set architecture, halving the CPI for 50% of the instructions, while increasing the clock cycle time of the processor by 10%. How much faster is the new implementation compared to the original? Assume that the usage of all instructions are equally likely in determining the execution time of any program for the purposes of this problem. Q: Consider the following program and assume that for this processor: -All arguments are passed on the stack. -Register V0 is for return values. -The S registers are expected to be saved, that is a calling routine can leave values in the S registers and expect it to be there after a call. -The T registers are expected to be temporary, that is a calling routine must not expect values in the T registers to be preserved after a call. int bar(int a, int b) { /* Code that uses registers T5, T6, S11-S13; */ return(1); } int foo(int a, int b, int c, int d, int e) { int x, y; /* Code that uses registers T5-T10, S11-S13; */ bar(x, y); /* call bar */ /* Code that reuses register T6 and arguments a, b, and c; return(0); } main(int argc, char **argv) { int p, q, r, s, t, u; /* Code that uses registers T5-T10, S11-S15; */ foo(p, q, r, s, t); /* Call foo */ /* Code that reuses registers T9, T10; */ } Here is the stack when bar is executing, clearly indicate in the spaces provided which procedure (main, foo, bar) saved specific entries on the stack. main foo bar ____ ____ ____ p ____ ____ ____ q ____ ____ ____ r ____ ____ ____ s ____ ____ ____ t ____ ____ ____ u ____ ____ ____ T9 ____ ____ ____ T10 ____ ____ ____ p ____ ____ ____ q ____ ____ ____ r ____ ____ ____ s ____ ____ ____ t ____ ____ ____ x ____ ____ ____ y ____ ____ ____ S11 ____ ____ ____ S12 ____ ____ ____ S13 ____ ____ ____ S14 ____ ____ ____ S15 ____ ____ ____ T6 ____ ____ ____ x ____ ____ ____ y ____ ____ ____ S11 ____ ____ ____ S12 ____ ____ ____ S13 <----------- Top of Stack Q: An interrupt occurs and the hardware saves the current PC in a register and then jumps to the interrupt handling code. What needs to be done? Q: We are going to save some info on the stack which way is better and why? A. Method 1: ADD SP, SP, 12 SW R2, 8(SP) SW R3, 4(SP) SW R4, 0(SP) B. Method 2 SW R2, 8(SP) SW R3, 4(SP) SW R4, 0(SP) ADD SP, SP, 12 C. Neither will work properly Your choice: ____ Why? Q: Name a scheduling algorithm where starvation is impossible Q: Which scheduling algorithm was noted as having a high variance in turnaround time Q: Which scheduling algorithm is provable optimal (minimum average waiting time). Q: Which scheduling algorithm must be preemptive? Q: A certain change is being considered in the non-pipelined (multicycle) MIPS CPU regarding the implementation of the ALU instructions. This change will enable one to perform an arithmetic operation and write the result into the register file all in one clock cycle. However, doing so will increase the clock cycle time of the CPU. Specifically, the original CPU operates on a 500 MHz clock, but the new design will only execute on a 400 MHz clock. Will this change improve, or degrade performance? How many times faster (or slower) will the new design be compared to the original design. Assume instructions are executed with the following frequency: (lw: 25%, sw: 15%, ALU: 45%, beq: 10%, j: 5%). Compute the CPI of both the original and the new CPU. Show all work in coming up with your answer. Cycles per Instruction LW 5 SW 4 ALU 4 beq 3 j 3 Q: Consider a CPU with a stack-based instruction set. Operands and results for arithmetic instructions are stored on the stack; the architecture contains no general purpose registers. The data path shown on the next page uses two separate memories, a 65,536 (216) byte memory to hold instructions and (non-stack) data, and a 256 byte memory to hold the stack. The stack is implemented with a conventional memory and a stack pointer register. The stack starts at address 0, and grows upward (to higher addresses) as data are pushed onto the stack. The stack pointer points to the element on top of the stack (or is -1 if the stack is empty). You may ignore issues such as stack overflow and underflow. Memory addresses referring to locations in program/data memory are 16 bits. All data are 8 bits. Assume the program/data memory is byte addressable, i.e., each address refers to an 8 bit byte. Each instruction includes an 8 bit opcode. Many instructions also include a 16 bit address field. The instruction set is shown below. Below, "memory" refers to the program/data memory (as opposed to the stack memory). OPCODE INSTRUCTION OPERATION =========== =================== ============================= 00000000 PUSH push the contents of memory at address onto the stack 00000001 POP pop the element on top of the stack into memory at location 00000010 ADD Pop the top two elements from the stack, add them, and push the result onto the stack 00000100 BEQ Pop top two elements from stack; if they're equal, branch to memory location Note that the ADD instruction is only 8 bits, but the others are 24 bits. Instructions are packed into successive byte locations of memory (i.e., do NOT assume all instruction uses 24 bits). Assume memory is 8 bits wide, i.e., each read or write operation to main memory accesses 8 bits of instruction or data. This means the instruction fetch for multi-byte instructions requires multiple memory accesses. Datapath Complete the design of the data path at: http://www.cc.gatech.edu/classes/AY2002/cs2200_summer/stackdatapath.jpg Assume reading or writing the program/data memory or the stack memory requires a single clock cycle to complete (actually, slightly less to allow time to read/write registers). Similarly, assume each ALU requires slightly less than one clock cycle to complete an arithmetic operation, and the zero detection circuit requires negligible time. Control Unit Show a state diagram for the control unit indicating the control signals that must be asserted in each state of the state diagram. Do not worry about optimizing performance or minimizing the number of states. Q: One can imagine a single instruction computer (SIC) whose only instruction is: Subtract and branch if negative: sbn a, b, c # Mem[a] <- Mem[a] - Mem[b] if(Mem[a] < 0) go to c The instruction will subtract the number in memory location b from the number in location a and place the result back in location a. If the result is greater than or equal to zero then the computer will take its next instruction from the memory location just after the current instruction location. If the result is less than zero, the next instruction is taken from memory location c. [Note: Most typical instructions found in computers such as the Mips processor can be synthesized as a sequence of instructions on the SIC.] This computer has no general purpose registers. Design a datapath and control logic to implement the SIC! Q: Differentiate between internal and external fragmentation. Q: What are the actions taken by the hardware upon an interrupt in a pipelined processor? Q: Suggest a scheme to control thrashing using the instantaneous page fault rate observed by the operating system. Q: (Answer True/False with justification) In an architecture that supports virtual memory, it always takes at least 2 trips to the physical memory for every memory access generated by the processor. Q: A memory hierarchy has the following resources: L1 cache 2 ns access time 98% hit rate L2 cache 10 ns access time ?? Memory 60 ns access time Assume that for each of L1 and L2 caches, a lookup is necessary to determine whether a reference is a hit or miss. What should be the hit rate of L2 cache to ensure that the effective memory access time (P&H book also calls this the Average memory access time) is no more than 3 ns. Q: Consider a pre-emptive priority processor scheduler. There are three processes P1, P2, and P3 in the job mix that have the following characteristics: Process Arrival Priority Activity Time ----------- ----------- --------------- --------------------------- P1 0 sec 1 8 sec CPU burst followed by 4 sec I/O burst followed by 6 sec CPU burst and quit P2 2 sec 3 64 sec CPU burst and quit P3 4 sec 2 2 sec CPU burst followed by 2 sec I/O burst followed by 2 sec CPU burst followed by 2 sec I/O burst followed by 2 sec CPU burst followed by 2 sec I/O burst followed by 2 sec CPU burst and quit What is the turnaround time for each of P1, P2, and P3? What is the average waiting time for this job mix? Q: Consider an architecture wherein for each entry in the TLB there is 1 reference bit (that is set by hardware when the associated TLB entry is referenced by the CPU for address translation), and 1 dirty bit (that is set by hardware when the associated TLB entry is referenced by the CPU for a store access). These bits are in addition to the other fields of the TLB that we have discussed in the class. The architecture provides three special instructions: one for sampling the reference bit for a particular TLB entry (Sample_TLB(entry_num)) one for clearing the reference bit for a particular TLB entry (Clear_refbit_TLB(entry_num)) one for clearing the reference bits in all the TLB entries (Clear_all_refbits_TLB()). Come up with a scheme to implement page replacement using this additional help from the TLB. For full credit, you should show data structures and pseudo code that the algorithm maintains to implement page replacement. Use back of the page if necessary. Q: What are the important deficiencies of FCFS CPU scheduling? Q: In a processor with a five stage pipeline as discussed in the class and shown in the picture below (with buffers between the stages), explain the problem posed by a branch instruction. Present a solution. +--------+ +--------+ +--------+ +--------+ |Pipeline| |Pipeline| |Pipeline| |Pipeline| IF->| |->ID->| |->EX->| |->MEM->| |->WB |Register| |Register| |Register| |Register| +--------+ +--------+ +--------+ +--------+ ^ ^ | | Reg ALU File here Here Q: A process has a memory reference pattern as follows: 0 0 1 2 3 4 0 0 1 2 3 4 The above pattern (which are the virtual page numbers of the process) repeats throughout the execution of the process. ======= Assume that there are 3 physical frames. Show the paging activity for a True LRU page replacement policy. For full credit show an LRU stack and the page that gets replaced (if any) for the first 12 accesses. Clearly indicate which accesses are hits and which are page faults. Q: A process has a memory reference pattern as follows: 0 0 1 2 3 4 0 0 1 2 3 4 The above pattern (which are the virtual page numbers of the process) repeats throughout the execution of the process. ======= What would be the paging activity for an optimal page replacement policy for the first 12 accesses? For such a policy indicate which accesses are hits and which are page faults. Q: For the scheduling algorithms we discussed fill in the following table: Algorithm Starvation High variance Provably optimal MUST be Possible? in turnaround (min. average preemptive time? waiting time) ----------- ----------- --------------- ------------------- ---------- First Come First Served Shortest Job First Priority Round Robin Multilevel Queue Multilevel Feedback Queue Q: A pipelined processor has a forwarding unit installed in the EX stage (the one with the ALU). Assuming that the following instruction is passing through the pipeline write an instruction that would use the forwarding unit. add R1, R2, R3 # Add contents of R2 and R3 store result in R1 __________________ # Instruction that will use forwarding unit to # avoid # stalling the pipeline. Q: You are designing a cache for a 32-bit processor. Memory is organized into words but byte addressing is used. You have been told to make the cache 2-way set associative with 64 words (256 bytes) per block. You are allowed to use a total of 64K words (256K bytes) for the data (excluding tags, status bits, etc.) Sketch the layout of the cache (note: no control logic should be shown): [We are interested in the number and size of things.] Show how a memory reference will be subdivided: Byte offset, block offset, index(set), tag. 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 +---------------------------------------------------------------+ | | | | +---------------------------------------------------------------+ Q: A processor asks for the contents of virtual memory address 0x30020. The paging scheme in use breaks this into a VPN of 0x30 and an offset of 0x020. PTB (a CPU register that holds the address of the page table) has a value of 0x100 indicating that this processes page table starts at location 0x100. The machine uses word addressing and the page table entries are each one word long. +---+-----+ |PTB|0x100| +---+-----+ +---+------+ |VPN|Offset| | 30|020 | +---+------+ The contents of selected memory locations are: Physical Contents Address 0x00000 0x00000 0x00100 0x00010 0x00110 0x00000 0x00120 0x00045 0x00130 0x00022 0x10000 0x03333 0x10020 0x04444 0x22000 0x01111 0x22020 0x02222 0x45000 0x05555 0x45020 0x06666 What is the physical address calculated? What is the contents of this address returned to the processor? How many memory references would be required (worst case)? Q: From the following statements regarding cache memory, select the ones that are True. It can usually be manipulated just like registers from the instruction-set architecture of a processor It cannot usually be directly manipulated from the instruction- set architecture of a processor It is usually implemented using the same technology as the main memory It is usually much larger than a register file but much smaller than main memory Q: What is meant by cache coherency? ___ Cache is operating faster than main memory ___ Cache is operating on a higher plain of consciousness ___ Cache and memory agree or they don't and we know and can do something about it ___ Cache is on a single chip ___ Cache is on same chip as processor ___ Cache is physically between processor and main memory ___ cache is physically located on prime meridian Q: In a pipelined processor where each instruction could be broken up into 5 stages and where each stage takes 1 ns what is the best we could hope to do in terms of average time to execute 1,000,000,000 instructions? Q: What are the three main types of hazards we encounter in a pipelined processor. Q: Define Locality and discuss the various types. Q: Define Cache Hit Ratio Q: These next two questions assume that you are in your page fault handler. On a page fault when is the only time you should save a page from disk? Q: When should you load a page from disk? Q: Is a megabyte exactly 1 million bytes? ___ Yes ___ No ___ I'm not sure? ___ I'm changing my major Q: Assume a processor with CPI = 1.0 (if all references hit in primary cache). Clock is 1000 MHz. Main memory access time is 100 ns. Miss rate at primary cache is 6%. What is clock cycle time? What is miss penalty in clock cycles? What is effective CPI? Now we add in a second level cache with a 10 ns access time. This cache has a miss rate to main memory of 3% What is the miss penalty for the second level cache? What is the total effective CPI? Q: Name a scheduling algorithm where starvation is impossible Q: Which scheduling algorithm was noted as having a high variance in turnaround time Q: Which scheduling algorithm is provable optimal (minimum average waiting time). Q: Which scheduling algorithm must be preemptive? Q. A disk has 20 surfaces (i.e. 10 double sided platters). Each surface has 1000 tracks. Each track has 128 sectors. Each sector can hold 64 bytes. The disk space allocation policy allocates an integral number of contiguous cylinders to each file, How many bytes are contained in one cylinder? How many cylinders are needed to accommodate a 5-Megabyte file on the disk? How much space is wasted in allocating this 5-Megabyte file? Q. Is a megabyte exactly 1 million bytes? ___ Yes ___ No ___ I'm not sure? ___ I'm changing my major Q. What is the difference between program controlled I/O and DMA? Q. A disk has the following configuration: The disk has 310 MB Track size: 4096 bytes Sector Size: 64 bytes Mr. Gill Bates a programmer at Macrosoftware has 96 objects each being 50 bytes in size. He decides to save each object as an individual file. How many bytes (in total) are actually written to disk? Q. Describe in detail the sequence of operations involved in a DMA data transfer. Q. What are the mechanical things that must happen before a disk drive can read data. ___ DMA transfer request must be initiated by processor ___ Heads must be positioned on correct track ___ Thread scheduler must block on I/O call ___ Disk must rotate so that desired sector is under head Q. A disk drive has 3 double-sided platters. The drive has 300 cylinders. How many tracks are there? ___ 600 ___ 900 ___ 1800 ___ 3600 Q. While a disk unit is performing a DMA transfer what device keeps the processor from having to steal bus cycles? ___ Cache consistency protocol ___ TCP/IP stack ___ Cache ___ Club anti-theft device Q. A bus normally contains what three kinds of wires or lines? ___ Twisted pair ___ Data ___ Multi-mode ___ Full metal jacket ___ Control ___ Phaser ___ Address Q. What is typically the fastest? ___ Program controlled I/O ___ DMA ___ Custom I/O Processor Q. Select all choices that apply regarding threads created in the same address space. ___ they share code ___ they share global data ___ they share the stack ___ they share the heap Q. From the following sentences regarding threads, select the ones that are True. ___ An operating system that provides no support for threads will block the entire process if one of the (user level) threads in that process makes a blocking system call. ___ In pthreads, a thread that does a pthread_cond_wait will always block. ___ In pthreads, a thread that does a pthread_mutex_lock will always block. ___ In Solaris, all user level threads in the same process compete equally for CPU resources. ___ All the threads within a single process in Solaris share the same page table. Q. List two advantages of buses ___ Cheap ___ Throughput bottleneck ___ Flexible ___ Yellow ___ CDSMA ___ Congestion resistant Q. How does a device on the Ethernet know that there is a collision? ___ Sound collision warnings ___ Lookout ___ Collision avoidance radar ___ Listening Q. What is the normal objective of a disk scheduling algorithm? ___ Minimize rotational latency ___ Control head acceleration ___ Maximize skew ratio ___ Minimize head movement Q. What are the two schemes used to connect nodes on a multiprocessor? ___ PB & J ___ Bus ___ Vampire taps ___ Network ___ Net worth ___ Net BIOS ___ NetBUI ___ Munder NetNet Q. Does snooping work with a network connected computer to keep caches consistent? ___ Yes ___ No Q. The use of checksum in a packet is intended to address the following sources of unreliability in networks ___ corrupted or mangled data packets ___ out of order packet arrival ___ loss of packets ___ spurious packets injected into the network Q. What is the big advantage of kernel threads? ___ Allow for non-blocking upcalls to TCP/IP ___ Allow O/S to schedule different threads on different processors ___ Abstraction of user level threads into LWP's Q. Why use a "while" loop around a pthread_cond_wait() call? ___ Allow us to bypass wait when not necessary ___ Clear consistent programming style ___ Insures that required resource is still available Q. With pthreads, do you have any control over thread scheduling? ___ Yes ___ No ___ Only with Solaris V ___ Only with Red Hat Linux 3.6 Q. Select all statements that are True with respect to the remote procedure call mechanism ___ The parameter passing mechanism that is usually allowed is value/result. ___ It guarantees that the call will be executed at least once. ___ It is most efficient to implement without transport layer level ACKS. ___ It blocks the caller until the result comes back from the call. ___ It requires a nameserver Q. There is an old network saying: Bandwidth problems can be cured with money. Latency problems are harder because the speed of light is fixed-you can't bribe ____________. ___ Your TA ___ Your instructor ___ God ___ Jim Greenlee Q. How does the receiver of a packet verify that the packet arrived correctly, that is without error? (Check all that apply) ___ Parity checking ___ Checksum ___ Handwaving ___ Bit count Q. What allows multiple devices on a network to talk to one another simultaneously? (Check all that apply) ___ Token ring ___ Shared Media ___ Switched Media ___ Virtual LAN's ___ NIC ___ Repeater Q. In a token ring network there are typically how many tokens? (Check all that apply) ___ 1 for the sender ___ 1 for the receiver ___ 1 for the bus arbiter ___ 1 for each bridge Q. What technique is typically used today to locate a desired service? (Check all that apply) ___ Load time binding ___ IMP Router Tables ___ Domain Name Servers ___ Broadcast Service Request (BSR) Q. What are network protocols used for? (Check all that apply) ___ Deal with sources of unreliability ___ Shielding from RF interference ___ ISP Loading ___ Polling Q. How do we deal with out of order delivery? (Check all that apply) ___ Timer ___ Attach sequnce number to packet ___ Maintain counter ___ Sliding window ___ Session ID Q. How do we deal with Duplicate packets? (Check all that apply) ___ Timer ___ Attach sequnce number to packet ___ Maintain counter ___ Sliding window ___ Session ID Q. How do we deal with Lost packets? (Check all that apply) ___ Timer ___ Attach sequnce number to packet ___ Maintain counter ___ Sliding window ___ Session ID Q. How do we deal with flow control i.e. Having to wait a long time for acks. (Check all that apply) ___ Timer ___ Attach sequnce number to packet ___ Maintain counter ___ Sliding window ___ Session ID Q. RPC is modeled after (Check all that apply) ___ Circuit switching ___ Stub marshalling ___ Procedural abstraction ___ CSMA/CD Q. Telnet and FTP are examples of services found on a: ___ Network Operating System ___ Distributed OS Q. Rank in order of typical distance between nodes (closest to farthest apart) ___ LAN ___ MPP ___ WAN Q. What does a sliding window do? (Check all that apply) ___ Allow transmission of multiple packets before receiving an ACK ___ Provide a mechanism for responding to congestion ___ Provide more flexibility when defenestrating ___ Increase performance Q. Where are massively parallel processors typically connected? ___ Memory Bus ___ I/O Bus Q. Where are LAN's and WAN's typically connected? ___ Memory Bus ___ I/O Bus Q. What can we do to reduce congestion in a network? (Check all that apply) ___ Log off ISP's ___ Throw away packets ___ Tune bandwidth ___ Pin processors Q. Mark all that understand IP addresses ___ Repeaters ___ Bridges ___ Routers ___ Dan Colestock Q. What happens when you take that little resistor off the end of the ethernet coax cable? (Check all that apply) ___ Refraction ___ Reflection ___ Rarification ___ Reduction ___ Resistance Q. What semantics are used by RPC? (Check all that apply) ___ Only once ___ At least once ___ At most once ___ One ounce Q. Here are the 5 layers of the TCP/IP protocol Layer 1: Physical Layer 2: Network Interface Layer 3: Internet Layer 4: Transport Layer 5: Application For the item below specify which layer is responsible ___ Specifies how one application uses an internet ___ Specify forwarding mechanisms used by routers ___ Organize data into frames ___ Specifies how to ensure reliable transfer ___ How to transmit over network ___ Basic network hardware Q. Write a function in C that will swap two integers. The call will look like this: int a = 10; int b = 20; /* Call to swap occurs here */ printf("%d %d\n", a, b); ----------------------------------------------------------- Output: 20 10 ----------------------------------------------------------- Write the function here: Q. True or False: Synchronization is concerned with ensuring that operations being performed concurrently by different entities are performed in a proper order A. True B. False Q. Which of the following are used in modern communication networks (circle all that apply): A. Geo-synchronous satellites B. Low earth orbit satellite arrays C. Fiber optic cable D. Cold fusion Q. Which of the following is a function implemented by a transport level protocol such as TCP (circle all that apply): A. Ensuring reliable delivery of data B. Representing a logic '1' as a high voltage, and a '0' as a low voltage C. Framing D. Routing data through the network Q. True or False. Sliding windows provide a means to increase throughput over networks with long end-to-end latency A. True B. False Q. True of False. A basic problem addressed by Internet addressing schemes is encoding network and host Ids into a fixed number of bits. A. True B. False Q. Which of the following are valid reasons for creating layered network protocols (circle all that apply): A. They provide a useful alternative to TCP/IP B. They greatly facilitate the interconnection of heterogeneous computer networks to create systems such as the Internet C. They provide a means to create abstractions to help manage the complexity of communication networks D. They are necessary to maximize the performance of a communication network Q. Naming is a basic problem in client server systems. Solution approaches to attacking this problem include (circle all that apply): A. Name server to map names to port numbers. B. Sockets C. Conventions are used to define well-known names D. IP addresses Q. Consider the program below that uses the test and set (TandS) primitive. TandS(L) operates on the Boolean variable L; it returns the value stored in L just prior to when the primitive is invoked, and sets L to true, all as one atomic (indivisible) operation. The program below assumes there are N threads, numbered 0 … N-1 that can access a critical section. Boolean waiting[N]; /* N element array */ Boolean L, Key; int i, j; /* code executed by process i to access a critical section */ waiting[i] = TRUE; Key = TRUE; while (waiting[i] && key) key = TandS (L); waiting[i] = FALSE; < access critical section > j = (i+1)%N; while (j != i) && !waiting[j]) j = (j+1) % N; if (i == j) lock = FALSE; /* STATEMENT 1 */ else waiting[j] = FALSE; (a) Explain the situation where STATEMENT 1 shown above will execute and set lock to FALSE. (b) Explain the purpose of the array "waiting" in the above code, i.e., what problem does it solve? Q. If you were writing the code which decided which page to swap out during a page fault and you were using an operating system that allowed swapping out OS code...What would be the worst choice for a page to swap out? Q. A disk drive has 2 platters (4 surfaces). There are 128 tracks with 64 sectors per track. Each sector contains 256 bytes. How many bytes per cylinder? Q. Assume we have a mutex variable (critSecLock) and a condition variable (resAvSig) declared and initialized. Consider these code snippets: /* Snippet 1 */ pthread_mutex_lock(critSecLock); ResourceAvailable = false; pthread_mutex_unlock(critSecLock); /* Critical section */ ... /* Snippet 2 */ pthread_mutex_lock(critSecLock); if(!ResourceAvailable) { pthread_cond_wait(resAvSig, critSecLock); } ResourceAvailable = false; pthread_mutex_unlock(critSecLock); /* Now in critical section */ ... /* Snippet 3 */ pthread_cond_lock(critSecLock); while(!ResourceAvailable) { pthread_cond_wait(resAvSig, critSecLock); } ResourceAvailable = false; pthread_mutex_unlock(critSecLock); /* Now in critical section */ ... You should know that Snippet 3 is preferred but why? What's wrong with Snippet 1? What's wrong with Snippet 2? Why don't we worry in Snippet 3 about the following scenario: 1. The while checks and finds that the resource is not available 2. We get context switched out 3. Another thread changes the value of ResourceAvailable to true. 4. We get context switched back in 5. We wait and never get a signal because we are actually supposed to have the resource and no one ever sends us a signal. Q. We discussed multiprocessor cache coherency protocols where a block that was marked shared would get marked exclusive upon a write and at the same time other caches would be told to invalidate their copy. Why didn't we just tell the other caches the new value and keep the block as shared? (Mark all that are correct.) ___ a.) Would require an extra bus cycle ___ b.) Would be inefficient if our processor continued to read and write to that block while others weren't using it. ___ c.) We could have ___ d.) Would cause caches to become incoherent Q. Here is our basic mutex algorithm using atomic swap: loop: while(swap(&lock,RED) == RED) { // spin! } // Critical section here lock = GREEN; // Unlock the mutex goto loop; Why did we modify our basic mutex algorithm for use with multiple processors (each with their own cache) Modified version below: loop: do { while(lock == RED) { // spin } } while(swap(&lock,RED) == RED); // Critical section here lock = GREEN; // Unlock the mutex goto loop; ___ a.) To stop spinning while waiting for the lock ___ b.) Avoid excessive bus traffic ___ c.) Avoid problems caused by being context switched out at an inopportune time. ___ d.) Avoid deadlock Q. A knowledgeable CMPE designs a switch (such as the one we discussed in lecture) that was made using bridges. In order to save money he uses repeaters instead of bridges. Would you buy one of these switches? Why or why not? Q. Why do we need Network Protocols? (Mark all that are correct) ___ a.) The International Standards Organization (ISO) requires a layered approach to abstraction ___ b.) Networks sometimes don't work very well ___ c.) The TCP/IP suite requires them ___ d.) Allow different manufacturers to use different technologies to make the global Internet work. Q. Remote Procedure Call is modeled after what? Q. Remote Procedure Call guarantees (Mark all that are correct) ___ a.) At most once performance ___ b.) At least once performance ___ c.) One performance ___ d.) Perfomance or notification of failure to perform satisfactorily Q. Here is a portion of a router table: Dest Mask NextHop 30.0.0.0 255.0.0.0 40.0.0.7 40.0.0.0 255.0.0.0 deliver direct 128.1.0.0 255.255.0.0 deliver direct 192.4.10.0 255.255.255.0 128.1.0.9 Assume that we are trying to route a packet going to "Addr" Fill in the missing line of pseudocode: for each entry in table if(__________________________________________________) forward to NextHop Q. Normally when a router discards a packet it sends a message back to the sender letting it know. When doesn't this happen?