Computer Architecture Area
Body of Knowledge
General Architecture
1. Processor architecture including
instruction set design issues
RISC versus CISC
implementation techniques
basic issues in pipelining
resource management techniques such as scoreboarding
data dependencies and speculative execution
techniques for handling branches
techniques for handling load latency
superpipelined
superscalar
technology factors
Performance metrics
space and time
benchmarks (SPEC CPU2000, mediabench)
2. Memory hierarchy including
Technology issues
DRAM, SRAM speeds
CPU cycle time versus memory latency
Primary memory organization
interleaving
processor/memory buses
split transaction buses
Virtual memory
MMU and TLB
Cache memory
basic concepts
split I and D caches
on-chip caches
multi-level caches
virtually addressed caches
victim caches
Performance metrics
trace driven studies for evaluating memory hierarchies
simulation tools such as WARTS
3. Concurrent processors (12 hours)
Interconnection networks for concurrent processors
general overview
routing issues
Synchronization and communication issues
Machines for exploiting implicit parallelism
vector processors
SIMD
VLIW
Machines for exploiting explicit parallelism
message passing machines
communication mechanisms
communication overheads
techniques for overhead reduction such as active messages and
pre-allocation of communication buffers
shared memory machines
memory consistency models
cache coherence problem and solutions
other latency hiding mechanisms such as multithreading
SMPs and large-scale multiprocessors
integrating shared memory and message passing
Performance issues
metrics
evaluation approaches
benchmarks
4. Input/Output
review of basic I/O issues such as DMA, and interrupts
storage hierarchies: secondary and tertiary
advanced I/O topics such as RAID
Parallel Architectures
1. Fundamentals of parallel computation
taxonomy of parallel architectures
performance metrics and bounds
data dependence and scheduling
2. Interconnection networks
taxonomy and examples
performance and routing
combining networks (e.g., NYU Ultracomputer)
3. Shared memory multiprocessors
memory consistency models
memory coherence in bus-based multiprocessors
memory coherence in network-based multiprocessors
4. Message-based multiprocessors
hypercubes
routing in hypercubes and flow control
embedding topologies within the hypercube
5. Case studies
state-of-the-art parallel machines
Relevant Courses
Reading List
The primary material is covered in these two books, the current
textbooks for CS 6290 and CS 7110, respectively:
- Computer Architecture: A Quantitative Approach, Second Edition,
John L. Hennessy and David A. Patterson, 1996, Morgan Kaufmann
publishers.
- Parallel Computer Architecture: A Hardware/Software Approach,
David Culler and J.P. Singh with Anoop Gupta, 1998, Morgan Kaufmann
publishers.
Papers
Beyond the material in the books, we cover the following additional
topics using papers drawn from the professional literature. A number
of the papers appear in the following collection:
- Readings in Computer Architecture, Mark D. Hill, Norman
P. Jouppi and Gurindar S. Sohi, Eds., 1999, Morgan Koufmann
publishers.
Note that coverage implied by the papers listed below is not intended
to represent complete coverage of these topics because much of the
material is well covered in the textbooks.
Processor Architecture Principles
- David A. Patterson, "Reduced Instruction Set Computers,"
Communications of the ACM, 28(1), January, 1985.
- James E. Smith and Andrew R. Pleszkun, "Implementing Precise
Interrupts in Pipelined Processors," IEEE Transactions on Computers
37(5), May, 1988.
- Colwell et al., "A VLIW Architecture for a trace scheduling
compiler," Second International Conference on Architectural Support
for Programming Languages and Operating Systems (ASPLOS), 1987.
- Sohi, et al., "Multiscalar Processors," Proceedings of the 22nd
Annual International Symposium on Computer Architecture (ISCA), 1995.
Processor Case Studies
- David B. Papworth, "Tuning the Pentium Pro Microarchitecture,"
IEEE Micro, 16(2), pp. 8-15, 1996.
- R. E. Kessler, "The Alpha 21264 Microprocessor," IEEE Micro,
19(2), March-April, 1999.
- Rumi Zahir, Jonathan Ross, Dale Morris and Drew Hess, "OS and
Compiler Considerations in the Design of the IA-64 Architecture," in
Ninth International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS), 2000.
Memory Hierarchies
- Norman P. Jouppi, "Improving Direct-Mapped Cache Performance by
the Addition of a Small Fully-Associative Cache and Prefetch Buffers,"
in Proceedings of the 17th Annual International Symposium on Computer
Architecture (ISCA), 1990.
- David Lilja, "Cache Coherence in Large-Scale Shared-Memory
Multiprocessors: Issues and Comparisons", ACM Computing Surveys,
September 1993.
High-Performance I/O
- Smith, "Disk cache - Miss Ratio Analysis and Design Considerations",
ACM Transactions on Computer Systems (TOCS), August 1985.
- Peter M. Chen, et al., "RAID: High-Performance, Reliable Secondary
Storage", ACM Computing Surveys, vol. 26, no. 2, June, 1994.
pp. 145-188.
Interconnection Networks
- Patel, "Performance of Processor-Memory Interconnections for
Multiprocessors", IEEE Transactions on Computers, October 1981,
pp.771-780.
- Feng, "A Survey of Interconnection Networks", IEEE Computer, December
1981, pp.12-27.
- Pfister and Norton, "Hot Spot Contention and Combining in Multistage
Interconnection Networks", IEEE Transactions on Computers, October 1985,
pp. 943-948.
- Agarwal, "Limits on Interconnection Network Performance," IEEE
Transactions on Parallel and Distributed Systems (TPDS), October 1991.
- Ni and McKinley, "A Survey of Wormhole Routing Techniques in Direct
Networks," IEEE Computer, 1993.
Parallel Architecture Principles
- Gustafson, "Reevaluating Amdahl's Law," Communications of
the ACM, May 1988
- Karp and Flatt, "Measuring Parallel Processor Performance,"
Communications of the ACM, May 1990
- Mellor-Crumney and Scott, "Algorithms for Scalable
Synchronization on Shared-Memory Multiprocessors," ACM Transactions on
Computer Systems (TOCS), Feb 1991.
- Sivasubramaniam et al., "An Application-Driven Study of Parallel
System Overheads and Network Bandwidth Requirements," IEEE
Transactions on Parallel and Distributed Systems (TPDS) 10(3),
pp. 193-210, March, 1999.
Parallel Machine Case Studies
- Seitz, "The CalTech Cosmic Cube," Communications of the ACM,
January 1985, pp. 22-33.
- Hillis and Steele, "Data Parallel Algorithms," Communications of
the ACM, December 1986.
- Kung, "Why Systolic Architectures?" IEEE Computer, January 1982,
pp. 37-46.
- A.H. Veen, "Dataflow Machine Architecture," ACM Computing Surveys,
Vol 18, No 4, Dec 1986, PP 365-396.
- Kourosh Gharachorloo et al., "Architecture and Design of
AlphaServer GS320," Ninth International Conference on Architectural
Support for Programming Languages and Operating Systems (ASPLOS), 2000.
- Luiz Barroso et al., "Piranha: A Scalable Architecture Based on
Single-Chip Multiprocessing", Proceedings of the 27th Annual
International Symposium on Computer Architecture (ISCA), 2000.
Updated for Fall 2001