Computer Architecture Area

Body of Knowledge


General Architecture

1. Processor architecture including
   instruction set design issues
        RISC versus CISC
   implementation techniques
        basic issues in pipelining
            resource management techniques such as scoreboarding
            data dependencies and speculative execution
            techniques for handling branches
            techniques for handling load latency
        superpipelined
        superscalar
   technology factors
   Performance metrics
        space and time
        benchmarks (SPEC CPU2000, mediabench)
2. Memory hierarchy including
   Technology issues
        DRAM, SRAM speeds
        CPU cycle time versus memory latency
   Primary memory organization
        interleaving
        processor/memory buses
            split transaction buses
   Virtual memory
        MMU and TLB
   Cache memory
        basic concepts
        split I and D caches
        on-chip caches
        multi-level caches
        virtually addressed caches
        victim caches
   Performance metrics
        trace driven studies for evaluating memory hierarchies
        simulation tools such as WARTS
3. Concurrent processors (12 hours)
   Interconnection networks for concurrent processors
        general overview
        routing issues
   Synchronization and communication issues
   Machines for exploiting implicit parallelism
        vector processors
        SIMD
        VLIW
   Machines for exploiting explicit parallelism
        message passing machines
            communication mechanisms
            communication overheads
            techniques for overhead reduction such as active messages and
       pre-allocation of communication buffers
        shared memory machines
            memory consistency models
            cache coherence problem and solutions
            other latency hiding mechanisms such as multithreading
            SMPs and large-scale multiprocessors
        integrating shared memory and message passing
   Performance issues
        metrics
        evaluation approaches
        benchmarks
4. Input/Output
   review of basic I/O issues such as DMA, and interrupts
   storage hierarchies: secondary and tertiary
   advanced I/O topics such as RAID

Parallel Architectures

1. Fundamentals of parallel computation
        taxonomy of parallel architectures
        performance metrics and bounds
        data dependence and scheduling
2. Interconnection networks
        taxonomy and examples
        performance and routing
        combining networks (e.g., NYU Ultracomputer)
3. Shared memory multiprocessors
        memory consistency models
        memory coherence in bus-based multiprocessors
        memory coherence in network-based multiprocessors
4. Message-based multiprocessors
        hypercubes
        routing in hypercubes and flow control
        embedding topologies within the hypercube
5. Case studies
        state-of-the-art parallel machines

Relevant Courses


Reading List

The primary material is covered in these two books, the current textbooks for CS 6290 and CS 7110, respectively:

Papers

Beyond the material in the books, we cover the following additional topics using papers drawn from the professional literature. A number of the papers appear in the following collection:

Note that coverage implied by the papers listed below is not intended to represent complete coverage of these topics because much of the material is well covered in the textbooks.

Processor Architecture Principles

  1. David A. Patterson, "Reduced Instruction Set Computers," Communications of the ACM, 28(1), January, 1985.
  2. James E. Smith and Andrew R. Pleszkun, "Implementing Precise Interrupts in Pipelined Processors," IEEE Transactions on Computers 37(5), May, 1988.
  3. Colwell et al., "A VLIW Architecture for a trace scheduling compiler," Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1987.
  4. Sohi, et al., "Multiscalar Processors," Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA), 1995.

Processor Case Studies

  1. David B. Papworth, "Tuning the Pentium Pro Microarchitecture," IEEE Micro, 16(2), pp. 8-15, 1996.
  2. R. E. Kessler, "The Alpha 21264 Microprocessor," IEEE Micro, 19(2), March-April, 1999.
  3. Rumi Zahir, Jonathan Ross, Dale Morris and Drew Hess, "OS and Compiler Considerations in the Design of the IA-64 Architecture," in Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2000.

Memory Hierarchies

  1. Norman P. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," in Proceedings of the 17th Annual International Symposium on Computer Architecture (ISCA), 1990.
  2. David Lilja, "Cache Coherence in Large-Scale Shared-Memory Multiprocessors: Issues and Comparisons", ACM Computing Surveys, September 1993.

High-Performance I/O

  1. Smith, "Disk cache - Miss Ratio Analysis and Design Considerations", ACM Transactions on Computer Systems (TOCS), August 1985.
  2. Peter M. Chen, et al., "RAID: High-Performance, Reliable Secondary Storage", ACM Computing Surveys, vol. 26, no. 2, June, 1994. pp. 145-188.

Interconnection Networks

  1. Patel, "Performance of Processor-Memory Interconnections for Multiprocessors", IEEE Transactions on Computers, October 1981, pp.771-780.
  2. Feng, "A Survey of Interconnection Networks", IEEE Computer, December 1981, pp.12-27.
  3. Pfister and Norton, "Hot Spot Contention and Combining in Multistage Interconnection Networks", IEEE Transactions on Computers, October 1985, pp. 943-948.
  4. Agarwal, "Limits on Interconnection Network Performance," IEEE Transactions on Parallel and Distributed Systems (TPDS), October 1991.
  5. Ni and McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," IEEE Computer, 1993.

Parallel Architecture Principles

  1. Gustafson, "Reevaluating Amdahl's Law," Communications of the ACM, May 1988
  2. Karp and Flatt, "Measuring Parallel Processor Performance," Communications of the ACM, May 1990
  3. Mellor-Crumney and Scott, "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors," ACM Transactions on Computer Systems (TOCS), Feb 1991.
  4. Sivasubramaniam et al., "An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements," IEEE Transactions on Parallel and Distributed Systems (TPDS) 10(3), pp. 193-210, March, 1999.

Parallel Machine Case Studies

  1. Seitz, "The CalTech Cosmic Cube," Communications of the ACM, January 1985, pp. 22-33.
  2. Hillis and Steele, "Data Parallel Algorithms," Communications of the ACM, December 1986.
  3. Kung, "Why Systolic Architectures?" IEEE Computer, January 1982, pp. 37-46.
  4. A.H. Veen, "Dataflow Machine Architecture," ACM Computing Surveys, Vol 18, No 4, Dec 1986, PP 365-396.
  5. Kourosh Gharachorloo et al., "Architecture and Design of AlphaServer GS320," Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2000.
  6. Luiz Barroso et al., "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing", Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA), 2000.

Updated for Fall 2001