Here are questions from previous tests. They should roughly be for Test 1. If you see a question that is on material not to be covered on Test 1 then use it for Test 2. Q: What are the actions taken by the hardware upon an interrupt in a pipelined processor? Q: Suggest a scheme to control thrashing using the instantaneous page-fault rate observed by the operating system. Q: (Answer True/False with justification) In an architecture that supports virtual memory, it always takes at least 2 trips to the physical memory for every memory access generated by the processor. Q: (Answer True/False with justification) For a given workload and a given instruction-set architecture, reducing the CPI (clocks per instruction) of all the instructions will always improve the performance of the processor. Q: (Answer True/False with justification) Having a large register- file is detrimental to the performance of a processor since it results in a large overhead for procedure call/return in high- level languages. Q: An architecture has three types of instructions that have the following CPI: Type CPI A 2 B 5 C 1 An architect determines that he can reduce the CPI for B to 3, with no change to the CPIs of the other two instruction types, but with an increase in the clock speed of the processor. What is the maximum permissible increase in clock speed that will make this architectural change still worthwhile? Assume that all the workloads that execute on this processor use 30% of A, 10% of B, and 60% of C types of instructions. Q: A memory hierarchy has the following resources: L1 cache 2 ns access time 98% hit rate L2 cache 10 ns access time ?? Memory 60 ns access time Assume that for each of L1 and L2 caches, a lookup is necessary to determine whether a reference is a hit or miss. What should be the hit rate of L2 cache to ensure that the effective memory access time (P&H book also calls this the Average memory access time) is no more than 3 ns. Q: Consider a pre-emptive priority processor scheduler. There are three processes P1, P2, and P3 in the job mix that have the following characteristics: Process Arrival Priority Activity Time ------- ------- -------- -------------------------- P1 0 sec 1 8 sec CPU burst followed by 4 sec I/O burst followed by 6 sec CPU burst and quit P2 2 sec 3 64 sec CPU burst and quit P3 4 sec 2 2 sec CPU burst followed by 2 sec I/O burst followed by 2 sec CPU burst followed by 2 sec I/O burst followed by 2 sec CPU burst followed by 2 sec I/O burst followed by 2 sec CPU burst and quit What is the turnaround time for each of P1, P2, and P3? What is the average waiting time for this job mix? Q: Consider an architecture wherein for each entry in the TLB there is: 1 reference bit (that is set by hardware when the associated TLB entry is referenced by the CPU for address translation) 1 dirty bit (that is set by hardware when the associated TLB entry is referenced by the CPU for a store access). These bits are in addition to the other fields of the TLB that we have discussed in the class. The architecture provides three special instructions: one for sampling the reference bit for a particular TLB entry (Sample_TLB(entry_num)); one for clearing the reference bit for a particular TLB entry (Clear_refbit_TLB(entry_num)); and one for clearing the reference bits in all the TLB entries (Clear_all_refbits_TLB()). Come up with a scheme to implement page replacement using this additional help from the TLB. For full credit, you should show data structures and pseudo code that the algorithm maintains to implement page replacement. Q: (Answer True/False with justification for the item in bold letters) In the context of facilitating procedure call/return, a stack is a hardware abstraction. Q: Distinguish between frame pointer and stack pointer. Q: Give one sentence summary of the functionality for each of the following pieces of system programs: compiler, assembler, linker, and loader. Q: (Answer True/False with justification) In one clock cycle it is possible to compute B based on the current value of A. +------+ Constant | \ 1 --------------------+ \ | \ + \ \ ) \ + \ | > ALU +----------- B / | +---+ / + | R | / ) | E | + / | G | | / A ------+ I +------------------+ / | S | | / | T | +------+ | E | | R | +---+ Q: What are the important deficiencies of FCFS CPU scheduling? Q: Explain clearly with pictures the actions that are taken by the hardware and software respectively to handle an interrupt. Q: You are given the pipelined datapath for a processor as shown below. IR Brr Bm Brw +--------+ +--------+ +--------+ +--------+ |Pipeline| |Pipeline| |Pipeline| |Pipeline| IF->| 1 |->ID->| 2 |->EX->| 3 |->MEM->| 4 |->WB |Register| |Register| |Register| |Register| +--------+ +--------+ +--------+ +--------+ ^ ^ | | Reg ALU File here Here A loadword instruction has the following 32-bit format: OPCode A Reg B Reg Offset 8 bits 4 bits 4 bits 16 bits The semantic of this instruction is B <- Memory[A + offset] Register B gets the data at the memory address given by the effective address generated by adding the contents of Register A to the 16-bit offset in the instruction. All data and addresses are 32-bit quantities. Show what is needed in the pipeline registers (IR, Brr, Bm, and Brw) between the stages of the pipeline for the passage of this instruction. Clearly show the layout of each buffer. Show the width of each field in the buffer. You do not have to concern yourself about the format or the requirements of other instructions in the architecture for this problem. IR +-------------------------------------------------------------+ | | +-------------------------------------------------------------+ Brr +-------------------------------------------------------------+ | | +-------------------------------------------------------------+ Bm +-------------------------------------------------------------+ | | +-------------------------------------------------------------+ Brw +-------------------------------------------------------------+ | | +-------------------------------------------------------------+ Q: Mark the 5 classic components of a microprocessor per Patterson & Hennessy ___ Bus ___ Input ___ Interrupts ___ Benchmarks ___ Clock ___ Processor ___ Datapath ___ Power ___ Control ___ Icons ___ Memory ___ Cache ___ Output ___ OS Q: In the Mips and LC series processors where are operands normally found? Q: Module A has important data in both S and T registers and is about to call Module B. Which registers should it store on the stack? ___: S ___: S and T ___: T ___: Neither Q: Is it allowable to modify data stored in the interior of a stack? In other words are all accesses to data in a stack made via pushes and pops? Discuss. Q: Is a frame pointer necessary if procedure calls will be used? ___: Yes ___: No ___: Yes, but only if non-leaf procedures will be called. ___: Yes, but only if leaf-procedures will be called. Q: What would be the execution time for a program containing 2,000,000 instructions if the processor clock was running at 8 MHz and each instruction takes 4 clock cycles? Q: What are the advantages and disadvantages of a bus-based datapath? Q: Put these in order: A. Memory Ops B. Instruction Fetch C. Write back D. Execution E. Instruction Decode Q: Would you consider the stack which is used for facilitating procedure call/return a: _____ Hardware abstraction _____ Software abstraction _____ Pass by name calling convention _____ Virtual machine _____ Queue Q: Consider the compiler, assembler, linker and loader. Which resolves external references? Which one isn't necessary (assuming that a program is actually executed)? Which has an output that is relatively readable by men or women? Q: For this datapath (assuming that all lines are 16 bits wide) Fill in the table below. CLK | +---+ | R | | E | +------+ | G | C | \ A ------+ I +------------+ \ | S | | \ | T | + \ +---+ | E | \ ) | R | | R | \ + | E | +---+ \ | E | G | > ALU +-------+ I +---- F / | | S | +---+ / + | T | + R + / ) | E | | E | + / | R | | G | D | / +---+ B ------+ I +------------+ / | | S | | / | | T | +---+--+ CLK | E | | | R | | +---+ AND | CLK 1 2 3 4 5 6 | | | | | | V V V V V V ______ ______ ______ ______ ______ ______ | | | | | | | | | | | | | | | | | | | | | | | | ____| |____| |____| |____| |____| |____| |___ Time A B C D E F -------- ------- ------- ------- ------- ------- ------- 1 0x42 0xFE 0 0 0 0 2 0 0 _______ _______ _______ _______ 3 0xCAFE 0x1 _______ _______ _______ _______ 4 0 0 _______ _______ _______ _______ 5 0 0 _______ _______ _______ _______ 6 0 0 _______ _______ _______ _______ Q: Of the following, what is typically done before the others upon interrupt: _____ a.) Branch to interrupt handler _____ b.) Enable interrupts _____ c.) Mask frame pointer _____ d.) JALR int 13 _____ e.) Vector DL44 ATL-FTW _____ f.) Receive interrupt vector Q: You are designing a cache for a 32-bit processor. Memory is organized into words but byte addressing is used. You have been told to make the cache 2-way set associative with 64 words (256 bytes) per block. You are allowed to use a total of 64K words (256K bytes) for the data (excluding tags, status bits, etc.) Sketch the layout of the cache here (note: no control logic should be shown): [We are interested in the number and size of things. Show how a memory reference will be subdivided: Byte offset, block offset, index(set), tag. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_| 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 Q: Extend the LC-2200 assembly language to include a subtract operation. Give some pseudocode for the actions that must taken in each state. Q: Is it necessary for the LC series processor instruction set to have a subtract operation, or can subtraction be implemented using existing LC-99 instructions? Q: What is spillage? Q: What do you think about the speed of memory versus registers? Q: How many bytes of memory will this code snippet require when assembled? beq $s3, $s4, L1 add $s0, $s1, $s2 L1: sub $s0, $s0, $s3 Q: What is the advantage of fixed length instructions Q: What does "SETI" stand for? Q: What is a leaf procedure? Q: Suppose we have the following: Address 0xFF001000 Loop: NOP # Beginning of loop ... 0xFF002000 j Loop # Jump to beginning of loop The jump instruction op code is 6 bits: 000010 What will be in the remaining 26 bits? Q: What is the execution time for 5,000,000 instructions where each instruction takes 5 clock cycles and the clock runs at 1 GHz. Q: Suppose you are writing an LC-2200 program and you want to jump a distance that is farther that allowed by a BEQ instruction. Is there a way to jump to an address? a. The only way would be to use a series of BEQ instructions b. You could use another instruction c. No reasonable way exists d. It could be simulated using the LW instruction Q: Could the LC series processors be made to run faster if they had a second bus? If your answer was no, what else would you need? Another ALU An additional mux A DPRF Q: What are the four goals in processor design (that may conflict with each other)? Q: Suppose we want to convert this statement: g = h + A[i]; into MIPS assembler and suppose that the Address of A is located in $s3 g is in $s1 h is in $s2 and i is in $s4 Which code segment is correct? a add $t1, $t1, $s3 lw $t0, 0($t1) add $s1, $s2, $t0 b add $t1, $s4, $s4 add $t1, $t1, $t1 add $t1, $t1, $s3 lw $t0, 0($t1) add $s1, $s2, $t0 c add $t1, $s4, $s4 add $t1, $t1, $t1 add $t1, $t1, $s3 lw $t0, $s4($t1) add $s1, $s2, $t0 d add $t1, $s4, $s4 add $t1, $t1, $t1 add $t1, $t1, $s3 lw $t0, 4*($t1) add $s1, $s2, $t0 Q: Suppose you design a computer called the Big Looper 2000 that will never be used to call procedures and that will automatically jump back to the beginning of memory when it reaches the end. Do you need a program counter? ___: Yes. ___: No. ___: Yes but only if a JR instruction will be implemented Q: Is a frame pointer necessary if procedure calls will be used? ___: Yes ___: No ___: Yes, but only if non- leaf procedures will be called. ___: Yes, but only if leaf- procedures will be called. Q: Name at least four important functional units in the LC-99 1._________________________ 2._________________________ 3._________________________ 4._________________________ Q: What should be saved upon encountering an exception? Q: On the left is the memory layout for a Big-Endian Computer. On the right Little Endian. Each square is a byte. Four bytes make up a word. Byte addresses increment by 1. Word addresses increment by 4. Label the diagrams clearly indicating word and byte addresses. Work from top to bottom. Start addressing at 100 Big Little ----- ----- ----- ----- ----- ----- ----- ----- | | | | | | | | | | | | | | | | | | | | ----- ----- ----- ----- ----- ----- ----- ----- | | | | | | | | | | | | | | | | | | | | ----- ----- ----- ----- ----- ----- ----- ----- | | | | | | | | | | | | | | | | | | | | ----- ----- ----- ----- ----- ----- ----- ----- | | | | | | | | | | | | | | | | | | | | ----- ----- ----- ----- ----- ----- ----- ----- | | | | | | | | | | | | | | | | | | | | ----- ----- ----- ----- ----- ----- ----- ----- Q: In a pipelined processor where each instruction could be broken up into 5 stages and where each stage takes 1 ns what is the best we could hope to do in terms of average time to execute 1,000,000,000 instructions? Q: Name three types of hazards we must be concerned about in designing a pipelined processor. Give an example of each. Q: In the LC series of processors, why are there registers before the ALU? Q: A smart architect re-implements a given instruction-set architecture, halving the CPI for 50% of the instructions, while increasing the clock cycle time of the processor by 10%. How much faster is the new implementation compared to the original? Assume that the usage of all instructions are equally likely in determining the execution time of any program for the purposes of this problem. Q: Consider the following program and assume that for this processor: -All arguments are passed on the stack. -Register V0 is for return values. -The S registers are expected to be saved, that is a calling routine can leave values in the S registers and expect it to be there after a call. -The T registers are expected to be temporary, that is a calling routine must not expect values in the T registers to be preserved after a call. int bar(int a, int b) { /* Code that uses registers T5, T6, S11-S13; */ return(1); } int foo(int a, int b, int c, int d, int e) { int x, y; /* Code that uses registers T5-T10, S11-S13; */ bar(x, y); /* call bar */ /* Code that reuses register T6 and arguments a, b, and c; return(0); } main(int argc, char **argv) { int p, q, r, s, t, u; /* Code that uses registers T5-T10, S11-S15; */ foo(p, q, r, s, t); /* Call foo */ /* Code that reuses registers T9, T10; */ } Here is the stack when bar is executing, clearly indicate in the spaces provided which procedure (main, foo, bar) saved specific entries on the stack. main foo bar ____ ____ ____ p ____ ____ ____ q ____ ____ ____ r ____ ____ ____ s ____ ____ ____ t ____ ____ ____ u ____ ____ ____ T9 ____ ____ ____ T10 ____ ____ ____ p ____ ____ ____ q ____ ____ ____ r ____ ____ ____ s ____ ____ ____ t ____ ____ ____ x ____ ____ ____ y ____ ____ ____ S11 ____ ____ ____ S12 ____ ____ ____ S13 ____ ____ ____ S14 ____ ____ ____ S15 ____ ____ ____ T6 ____ ____ ____ x ____ ____ ____ y ____ ____ ____ S11 ____ ____ ____ S12 ____ ____ ____ S13 <----------- Top of Stack Q: An interrupt occurs and the hardware saves the current PC in a register and then jumps to the interrupt handling code. What needs to be done? Q: We are going to save some info on the stack which way is better and why? A. Method 1: ADD SP, SP, 12 SW R2, 8(SP) SW R3, 4(SP) SW R4, 0(SP) B. Method 2 SW R2, 8(SP) SW R3, 4(SP) SW R4, 0(SP) ADD SP, SP, 12 C. Neither will work properly Your choice: ____ Why? Q: Name a scheduling algorithm where starvation is impossible Q: Which scheduling algorithm was noted as having a high variance in turnaround time Q: Which scheduling algorithm is provable optimal (minimum average waiting time). Q: Which scheduling algorithm must be preemptive? Q: A certain change is being considered in the non-pipelined (multicycle) MIPS CPU regarding the implementation of the ALU instructions. This change will enable one to perform an arithmetic operation and write the result into the register file all in one clock cycle. However, doing so will increase the clock cycle time of the CPU. Specifically, the original CPU operates on a 500 MHz clock, but the new design will only execute on a 400 MHz clock. Will this change improve, or degrade performance? How many times faster (or slower) will the new design be compared to the original design. Assume instructions are executed with the following frequency: (lw: 25%, sw: 15%, ALU: 45%, beq: 10%, j: 5%). Compute the CPI of both the original and the new CPU. Show all work in coming up with your answer. Cycles per Instruction LW 5 SW 4 ALU 4 beq 3 j 3 Q: Consider a CPU with a stack-based instruction set. Operands and results for arithmetic instructions are stored on the stack; the architecture contains no general purpose registers. The data path shown on the next page uses two separate memories, a 65,536 (216) byte memory to hold instructions and (non-stack) data, and a 256 byte memory to hold the stack. The stack is implemented with a conventional memory and a stack pointer register. The stack starts at address 0, and grows upward (to higher addresses) as data are pushed onto the stack. The stack pointer points to the element on top of the stack (or is -1 if the stack is empty). You may ignore issues such as stack overflow and underflow. Memory addresses referring to locations in program/data memory are 16 bits. All data are 8 bits. Assume the program/data memory is byte addressable, i.e., each address refers to an 8 bit byte. Each instruction includes an 8 bit opcode. Many instructions also include a 16 bit address field. The instruction set is shown below. Below, "memory" refers to the program/data memory (as opposed to the stack memory). OPCODE INSTRUCTION OPERATION =========== =================== ============================= 00000000 PUSH push the contents of memory at address onto the stack 00000001 POP pop the element on top of the stack into memory at location 00000010 ADD Pop the top two elements from the stack, add them, and push the result onto the stack 00000100 BEQ Pop top two elements from stack; if they're equal, branch to memory location Note that the ADD instruction is only 8 bits, but the others are 24 bits. Instructions are packed into successive byte locations of memory (i.e., do NOT assume all instruction uses 24 bits). Assume memory is 8 bits wide, i.e., each read or write operation to main memory accesses 8 bits of instruction or data. This means the instruction fetch for multi-byte instructions requires multiple memory accesses. Datapath Complete the design of the data path at: http://www.cc.gatech.edu/classes/AY2002/cs2200_summer/stackdatapath.jpg Assume reading or writing the program/data memory or the stack memory requires a single clock cycle to complete (actually, slightly less to allow time to read/write registers). Similarly, assume each ALU requires slightly less than one clock cycle to complete an arithmetic operation, and the zero detection circuit requires negligible time. Control Unit Show a state diagram for the control unit indicating the control signals that must be asserted in each state of the state diagram. Do not worry about optimizing performance or minimizing the number of states. Q: One can imagine a single instruction computer (SIC) whose only instruction is: Subtract and branch if negative: sbn a, b, c # Mem[a] <- Mem[a] - Mem[b] if(Mem[a] < 0) go to c The instruction will subtract the number in memory location b from the number in location a and place the result back in location a. If the result is greater than or equal to zero then the computer will take its next instruction from the memory location just after the current instruction location. If the result is less than zero, the next instruction is taken from memory location c. [Note: Most typical instructions found in computers such as the Mips processor can be synthesized as a sequence of instructions on the SIC.] This computer has no general purpose registers. Design a datapath and control logic to implement the SIC!