Research Projects
|
Research Projects
|
3D Microarchitectures
See also: GT 3D Integration Research
Over the last several years, a large amount of research has gone into making
vertically intergrated chips feasible. By vertically stacking two or more
silicon wafers, connected with a high-density, high-speed interconnect, it
is now possible to combine multiple active device layers within a single
IC. While there has been a lot of fundamental research in showing that 3D
chips can in fact be built, there has not been very much research activity
in figuring out what should be done with this new technology. In particular,
we are interested in answering the question: what should a high-performance
microprocessor look like in 3D?
Three-dimensional intergrated circuits allows a time-warp for Moore's Law. By
stacking two wafers, the transistor density can be doubled using today's technology.
This provides a left-shift of the Moore's Law curve by 18 months. A stack of
four wafers provides a 3-year shift. Unfortunately, this shift applies to many of
the Moore's Law "corrolaries". For example, every doubling of transistor
density also comes with a doubling of power density. While the cost of 3D ICs
is far less than the cost of constructing a next-generation fab, it does add
processing steps and expense. We also do not receive the traditional speed-boost
that accompanies each new process generation: we receive twice the transistor, but
they are today's transistors, not the transistors available 18 months from now.
Besides transistor density, the other major and potentially revolutionary benefit
of 3D integration is the possibility of routing in the third dimension. Many
people in both industry and academia have been worried about the rapid growth of
RC delays relative to transistor speeds. 3D provides a way to reduce the long
global routes by vertically stacking functional unit blocks (FUBs), or even
folding individual FUBs. It is our hypothesis that blinding shoe-horning conventional
planar microarchitectures into a 3D-process will fail to realize the full potential
of this new technology. Our research goal is to understand the new tradeoffs
in a 3D world, and use that knowledge to find the ultimate 3D microarchitecture.
Funding for this project has been generously provided by the
Focus Center for Circuit & Systems Solutions, and the NSF.
3D Microarchitecture/Circuit Papers:
(PDF pre-prints available on the Publications page.)
1. |
Gabriel H. Loh,
3D-Stacked Memory Architectures for Multi-Core Processors,
35th ACM International Symposium on Computer Architecture (ISCA), June 2008
|
2. |
Gabriel H. Loh,
A Modular 3D Processor for Flexible Product Design and Technology Migration,
ACM International Conference on Computing Frontiers (CF), May 2008
|
3. |
Kiran Puttaswamy, Gabriel H. Loh,
3D-Integrated SRAM Components for High-Performance Microprocessors,
IEEE Transactions on Computers (TC), 2008
|
4. |
Kiran Puttaswamy, Gabriel H. Loh,
Scalability of 3D-Integrated Arithmetic Units in High-Performance Microprocessors,
ACM Design Automation Conference (DAC), June 2007
|
5. |
Gabriel H. Loh, Yuan Xie, Bryan Black,
Processor Design in Three-Dimensional Die-Stacking Technologies,
IEEE Micro, May-June 2007
|
6. |
Kiran Puttaswamy, Gabriel H. Loh,
Thermal Herding: Microarchitecture Techniques for Controlling HotSpots in High-Performance 3D-Integrated Processors,
13th International Symposium on High-Performance Computer Architecture (HPCA), February 2007
|
7. |
Michael Healy, Mario Vittes, Mongkol Ekpanyapong, Chinnakrishnan Ballapuram, Sung Kyu Lim, Hsien-Hsin S. Lee, Gabriel H. Loh,
Multi-Objective Microarchitectural Floorplanning for 2D and 3D ICs,
IEEE Transactions on Computer Aided Design (TCAD), January 2007
|
8. |
Bryan Black, Murali M. Annavaram, Edward Brekelbaum, John DeVale, Lei Jiang, Gabriel H. Loh, Don McCauley, Pat Morrow, Donald W. Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadas Shankar, John Paul Shen, Clair Webb,
Die Stacking (3D) Microarchitecture,
39th International Symposium on Microarchitecture (MICRO), December 2006
|
9. |
Yuan Xie, Gabriel H. Loh, Bryan Black, Kerry Bernstein,
Design Space Exploration for 3D Architectures,
ACM Journal of Emerging Technologies in Computing Systems (JETC), April 2006
|
10. |
Kiran Puttaswamy, Gabriel H. Loh,
Thermal Analysis of a 3D Die-Stacked High-Performance Microprocessor,
ACM/IEEE Great Lakes Symposium on VLSI (GLSVLSI), May 2006
|
11. |
Kiran Puttaswamy, Gabriel H. Loh,
Dynamic Instruction Schedulers in a 3-Dimensional Integration Technology,
ACM/IEEE Great Lakes Symposium on VLSI (GLSVLSI), May 2006
|
12. |
Kiran Puttaswamy, Gabriel H. Loh,
The Impact of 3-Dimensional Integration on the Design of Arithmetic Units,
IEEE International Symposium on Circuits and Systems (ISCAS), May 2006
|
13. |
Kiran Puttaswamy, Gabriel H. Loh,
Implementing Register Files for High-Performance Microprocessors in a Die-Stacked (3D) Technology,
IEEE International Symposium on VLSI (ISVLSI), March 2006
|
14. |
Kiran Puttaswamy, Gabriel H. Loh,
Implementing Caches in a 3D Technology for High Performance Processors,
International Conference on Computer Design (ICCD), October 2005
|
Multi-Core Resource Management
As the processor industry moves deeper down the multi-core path, the number
of execution cores and threads per core will continue to increase. Many
shared resources, however, such as the last-level cache (LLC), power budget,
thermal budget, various bus/interconnect bandwidths, will not increase at
the same rate (or at all). Many of these resources are already critical
to performance, and naive and unmanagement sharing of these resources
between multiple entities (cores) can lead to poor per-thread performance,
low overall system throughput, degraded fairness and unattainable quality of
service, and poor power-performance efficiency.
In this research project, we are exploring the management of these shared
resources in future multi-core and multi-threaded processors. We are attacking
this problem from multiple directions including hardware-only mechanisms,
processor-OS cooperative techniques, and application-level analysis and
characterization.
Funding for this project has been generously provided by the NSF.
Multi-Core Resource Management Papers:
(PDF pre-prints available on the Publications page.)
1. |
Yuejian Xie, Gabriel H. Loh,
Dynamic Classification of Program Memory Behaviors in CMPs,
2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (CMP-MSI, held in conjunction with ISCA-35), June 2008
|
2. |
Jonathan D. Kron, Brooks Prumo, Gabriel H. Loh,
Double-DIP: Augmenting DIP with Adaptive Promotion Policies to Manage Shared L2 Caches,
2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (CMP-MSI, held in conjunction with ISCA-35), June 2008
|
3. |
Rahul Garde, Samantika Subramaniam, Gabriel H. Loh,
Deconstructing the Inefficacy of Global Cache Replacement Policies,
7th Workshop on Duplicating, Deconstructing, and Debunking (WDDD, held in conjunction with ISCA-35), June 2008
|
4. |
Gabriel H. Loh,
The Cost of Uncore in Throughput-Oriented Many-Core Processors,
Workshop on Architectures and Languages for Throughput Applications (ALTA, held in conjunction with ISCA-35), June 2008
|
High-Performance High-Efficiency Microarchitectures
Modern microprocessors achieve high performance through a variety of
techniques, but common to most of these known methods is that this performance
comes at the cost of additional overhead. Examples of overhead include
pipeline latches and control, discarded instructions due to branch
mispredictions, execution of dynamically dead instructions, and support for
execution bandwidths that are far higher than the average utilization. This
overhead manifests itself in the form of increased energy per instruction,
increased power per instruction, and increased overall complexity. The impact
of these overheads will multiply as processor designs include chip
multiprocessing and arrays of smaller cores.
The research objective of this project is to identify and understand the nature
of the overheads in a processor's performance enhancing mechanisms. From these
learnings, we will invent new technologies to achieve similar performance
benefits while substantially reducing the associated overhead. The results
will be processor microarchitecture technologies that are much more efficient
than conventional designs. This efficiency can be used to decrease the peak
power, lower the peak temperature and increase the battery life of processors
targeted for mobile markets. Alternatively, the efficiency gains can be traded
for higher performance, or enable an overly aggressive design to be brought
back within its power budget.
Equipment and funding for this project has been generously provided by Intel.
HPHE uArch Papers:
(PDF pre-prints available on the Publications page.)
1. |
Samantika Subramaniam, Anne C. Bracy, Hong Wang, Gabriel H. Loh,
Criticality-Based Optimizations for Efficient Load Processing.,
19th International Symposium on High-Performance Computer Architecture (HPCA), February 2009
|
2. |
Mauricio Breternitz Jr., Gabriel H. Loh, Bryan Black, Jeffrey Rupley, Peter G. Sassone, Wesley Attrot, Youfeng Wu,
A Segmented Bloom Filter Algorithm for Efficient Predictors,
20th IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), October 2008
|
3. |
Samantika Subramaniam, Milos Prvulovic, Gabriel H. Loh,
PEEP: Exploiting Predictability of Memory Dependences in SMT Processors,
14th International Symposium on High-Performance Computer Architecture (HPCA), February 2008
|
4. |
Gabriel H. Loh, Daniel A. Jimenez,
Modulo Path History for the Reduction of Pipeline Overheads in Path-Based Neural Branch Predictors,
Springer International Journal of Parallel Programming (IJPP), April 2008
|
5. |
Peter G. Sassone, Jeff Rupley, Edward Brekelbaum, Gabriel H. Loh, Bryan Black,
Matrix Scheduler Reloaded,
34th International Symposium on Computer Architecture (ISCA), June 2007
|
6. |
Peter G. Sassone, D. Scott Wills, Gabriel H. Loh,
Static Strands: Safely Exposing Dependence Chains for Increasing Embedded Power Efficiency,
ACM Transactions on Embedded Computing Systems (TECS), September 2007
|
7. |
Samantika Subramaniam, Gabriel H. Loh,
Fire-and-Forget: Load/Store Scheduling with No Store Queue at All,
39th International Symposium on Microarchitecture (MICRO), December 2006
|
8. |
Ranjith Subramanian, Yannis Smaragdakis, Gabriel H. Loh,
Adaptive Caches: Effective Shaping of Cache Behavior to Workloads,
39th International Symposium on Microarchitecture (MICRO), December 2006
|
9. |
Chinnakrishnan Ballapuram, Kiran Puttaswamy, Gabriel H. Loh, Hsien-Hsin S. Lee,
Entropy-based Low Power Data TLB Design,
ACM/IEEE Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), October 2006
|
10. |
Daniel A. Jimenez, Gabriel H. Loh,
Controlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors,
18th IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), October 2006
|
11. |
Gabriel H. Loh,
Revisiting the Performance Impact of Branch Predictor Latencies,
International Symposium on Performance Analysis of Software and Systems (ISPASS), March 2006
|
12. |
Samantika Subramaniam, Gabriel H. Loh,
Store Vectors for Scalable Memory Dependence Prediction and Scheduling,
12th International Symposium on High-Performance Computer Architecture (HPCA), February 2006
|
13. |
Gabriel H. Loh,
A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction,
14th International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2005
|
14. |
Gabriel H. Loh, Daniel A. Jimenez,
Reducing the Power and Complexity of Path-Based Neural Branch Prediction,
5th Workshop on Complexity Effective Design (WCED, held in conjunction with ISCA-32), June 2005
|
15. |
Peter G. Sassone, D. Scott Wills, Gabriel H. Loh,
Static Strands: Safely Collapsing Dependence Chains for Increasing Embedded Power Efficiency,
Conference on Languages, Compilers and Tools for Embedded Systems (LCTES), June 2005
|
|
|