PhD CS – Computer Architecture Body of Knowledge

General Architecture

1. Processor architecture including

  • instruction set design issues
    • RISC versus CISC
  • implementation techniques
    • basic issues in pipelining
      • resource management techniques such as scoreboarding
      • data dependencies and speculative execution
      • techniques for handling branches
      • techniques for handling load latency
    • superpipelined
    • superscalar
  • technology factors
  • Performance metrics
    • space and time
    • benchmarks (SPEC CPU2000, mediabench)

2. Memory hierarchy including

  • Technology issues
    • DRAM, SRAM speeds
    • CPU cycle time versus memory latency
  • Primary memory organization
    • interleaving
    • processor/memory buses
      • split transaction buses
  • Virtual memory
    • MMU and TLB
  • Cache memory
    • basic concepts
    • split I and D caches
    • on-chip caches
    • multi-level caches
    • virtually addressed caches
    • victim caches
  • Performance metrics
    • trace driven studies for evaluating memory hierarchies
    • simulation tools such as WARTS

3. Concurrent processors (12 hours)

  • Interconnection networks for concurrent processors
    • general overview
    • routing issues
  • Synchronization and communication issues
  • Machines for exploiting implicit parallelism
    • vector processors
    • SIMD
    • VLIW
  • Machines for exploiting explicit parallelism
    • message passing machines
      •  communication mechanisms
      • communication overheads
      • techniques for overhead reduction such as active messages and
    • pre-allocation of communication buffers
    • shared memory machines
      • memory consistency models
      • cache coherence problem and solutions
      • other latency hiding mechanisms such as multithreading
      • SMPs and large-scale multiprocessors
    • integrating shared memory and message passing
  • Performance issues
    • metrics
    • evaluation approaches
    • benchmarks

4. Input/Output

  • review of basic I/O issues such as DMA, and interrupts
  • storage hierarchies: secondary and tertiary
  • advanced I/O topics such as RAID

Parallel Architectures

1. Fundamentals of parallel computation

  • taxonomy of parallel architectures
  • performance metrics and bounds
  • data dependence and scheduling

2. Interconnection networks

  • taxonomy and examples
  • performance and routing
  • combining networks (e.g., NYU Ultracomputer)

3. Shared memory multiprocessors

  • memory consistency models
  • memory coherence in bus-based multiprocessors
  • memory coherence in network-based multiprocessors

4. Message-based multiprocessors

  • hypercubes
  • routing in hypercubes and flow control
  • embedding topologies within the hypercube

5. Case studies

  • state-of-the-art parallel machines

Relevant Courses

  • CS 6290
  • CS 7110

Reading List

The primary material is covered in these two books, the current textbooks for CS 6290 and CS 7110, respectively:

  • Computer Architecture: A Quantitative Approach, Second Edition, John L. Hennessy and David A. Patterson, 1996, Morgan Kaufmann publishers.
  • Parallel Computer Architecture: A Hardware/Software Approach, David Culler and J.P. Singh with Anoop Gupta, 1998, Morgan Kaufmann publishers.

Papers

Beyond the material in the books, we cover the following additional topics using papers drawn from the professional literature. A number of the papers appear in the following collection:

  • Readings in Computer Architecture, Mark D. Hill, Norman P. Jouppi and Gurindar S. Sohi, Eds., 1999, Morgan Koufmann publishers.

Note that coverage implied by the papers listed below is not intended to represent complete coverage of these topics because much of the material is well covered in the textbooks. 

 

Processor Architecture Principles

  1. David A. Patterson, "Reduced Instruction Set Computers," Communications of the ACM, 28(1), January, 1985.
  2. James E. Smith and Andrew R. Pleszkun, "Implementing Precise Interrupts in Pipelined Processors," IEEE Transactions on Computers 37(5), May, 1988.
  3. Colwell et al., "A VLIW Architecture for a trace scheduling compiler," Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1987.
  4. Sohi, et al., "Multiscalar Processors," Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA), 1995.

Processor Case Studies

  1. David B. Papworth, "Tuning the Pentium Pro Microarchitecture," IEEE Micro, 16(2), pp. 8-15, 1996.
  2. R. E. Kessler, "The Alpha 21264 Microprocessor," IEEE Micro, 19(2), March-April, 1999.
  3. Rumi Zahir, Jonathan Ross, Dale Morris and Drew Hess, "OS and Compiler Considerations in the Design of the IA-64 Architecture," in Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2000.

Memory Hierarchies

  1. Norman P. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," in Proceedings of the 17th Annual International Symposium on Computer Architecture (ISCA), 1990.
  2. David Lilja, "Cache Coherence in Large-Scale Shared-Memory Multiprocessors: Issues and Comparisons", ACM Computing Surveys, September 1993.

High-Performance I/O

  1. Smith, "Disk cache - Miss Ratio Analysis and Design Considerations", ACM Transactions on Computer Systems (TOCS), August 1985.
  2. Peter M. Chen, et al., "RAID: High-Performance, Reliable Secondary Storage", ACM Computing Surveys, vol. 26, no. 2, June, 1994. pp. 145-188.

Interconnection Networks

  1. Patel, "Performance of Processor-Memory Interconnections for Multiprocessors", IEEE Transactions on Computers, October 1981, pp.771-780.
  2. Feng, "A Survey of Interconnection Networks", IEEE Computer, December 1981, pp.12-27.
  3. Pfister and Norton, "Hot Spot Contention and Combining in Multistage Interconnection Networks", IEEE Transactions on Computers, October 1985, pp. 943-948.
  4. Agarwal, "Limits on Interconnection Network Performance," IEEE Transactions on Parallel and Distributed Systems (TPDS), October 1991.
  5. Ni and McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," IEEE Computer, 1993.

Parallel Architecture Principles

  1. Gustafson, "Reevaluating Amdahl's Law," Communications of the ACM, May 1988
  2. Karp and Flatt, "Measuring Parallel Processor Performance," Communications of the ACM, May 1990
  3. Mellor-Crumney and Scott, "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors," ACM Transactions on Computer Systems (TOCS), Feb 1991.
  4. Sivasubramaniam et al., "An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements," IEEE Transactions on Parallel and Distributed Systems (TPDS) 10(3), pp. 193-210, March, 1999.

Parallel Machine Case Studies

  1. Seitz, "The CalTech Cosmic Cube," Communications of the ACM, January 1985, pp. 22-33.
  2. Hillis and Steele, "Data Parallel Algorithms," Communications of the ACM, December 1986.
  3. Kung, "Why Systolic Architectures?" IEEE Computer, January 1982, pp. 37-46.
  4. A.H. Veen, "Dataflow Machine Architecture," ACM Computing Surveys, Vol 18, No 4, Dec 1986, PP 365-396.
  5. Kourosh Gharachorloo et al., "Architecture and Design of AlphaServer GS320," Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2000.
  6. Luiz Barroso et al., "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing", Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA), 2000.

Last updated for Fall 2001.