Content-type: text/html X-Powered-By: PHP/4.3.9 STING::Research


Research Projects
 Home 

 Research Projects 
 3D μArchs
 Multi-Core Resource Management
 High-Performance High-Efficiency μArchs

 People 

 Publications 

 Opportunities 

















3D Microarchitectures

See also: GT 3D Integration Research

Over the last several years, a large amount of research has gone into making vertically intergrated chips feasible. By vertically stacking two or more silicon wafers, connected with a high-density, high-speed interconnect, it is now possible to combine multiple active device layers within a single IC. While there has been a lot of fundamental research in showing that 3D chips can in fact be built, there has not been very much research activity in figuring out what should be done with this new technology. In particular, we are interested in answering the question: what should a high-performance microprocessor look like in 3D?

Three-dimensional intergrated circuits allows a time-warp for Moore's Law. By stacking two wafers, the transistor density can be doubled using today's technology. This provides a left-shift of the Moore's Law curve by 18 months. A stack of four wafers provides a 3-year shift. Unfortunately, this shift applies to many of the Moore's Law "corrolaries". For example, every doubling of transistor density also comes with a doubling of power density. While the cost of 3D ICs is far less than the cost of constructing a next-generation fab, it does add processing steps and expense. We also do not receive the traditional speed-boost that accompanies each new process generation: we receive twice the transistor, but they are today's transistors, not the transistors available 18 months from now.

Besides transistor density, the other major and potentially revolutionary benefit of 3D integration is the possibility of routing in the third dimension. Many people in both industry and academia have been worried about the rapid growth of RC delays relative to transistor speeds. 3D provides a way to reduce the long global routes by vertically stacking functional unit blocks (FUBs), or even folding individual FUBs. It is our hypothesis that blinding shoe-horning conventional planar microarchitectures into a 3D-process will fail to realize the full potential of this new technology. Our research goal is to understand the new tradeoffs in a 3D world, and use that knowledge to find the ultimate 3D microarchitecture.

Funding for this project has been generously provided by the Focus Center for Circuit & Systems Solutions, and the NSF.

3D Microarchitecture/Circuit Papers:
(PDF pre-prints available on the Publications page.)

1. Gabriel H. Loh, 3D-Stacked Memory Architectures for Multi-Core Processors, 35th ACM International Symposium on Computer Architecture (ISCA), June 2008
2. Gabriel H. Loh, A Modular 3D Processor for Flexible Product Design and Technology Migration, ACM International Conference on Computing Frontiers (CF), May 2008
3. Kiran Puttaswamy, Gabriel H. Loh, 3D-Integrated SRAM Components for High-Performance Microprocessors, IEEE Transactions on Computers (TC),  2008
4. Kiran Puttaswamy, Gabriel H. Loh, Scalability of 3D-Integrated Arithmetic Units in High-Performance Microprocessors, ACM Design Automation Conference (DAC), June 2007
5. Gabriel H. Loh, Yuan Xie, Bryan Black, Processor Design in Three-Dimensional Die-Stacking Technologies, IEEE Micro, May-June 2007
6. Kiran Puttaswamy, Gabriel H. Loh, Thermal Herding: Microarchitecture Techniques for Controlling HotSpots in High-Performance 3D-Integrated Processors, 13th International Symposium on High-Performance Computer Architecture (HPCA), February 2007
7. Michael Healy, Mario Vittes, Mongkol Ekpanyapong, Chinnakrishnan Ballapuram, Sung Kyu Lim, Hsien-Hsin S. Lee, Gabriel H. Loh, Multi-Objective Microarchitectural Floorplanning for 2D and 3D ICs, IEEE Transactions on Computer Aided Design (TCAD), January 2007
8. Bryan Black, Murali M. Annavaram, Edward Brekelbaum, John DeVale, Lei Jiang, Gabriel H. Loh, Don McCauley, Pat Morrow, Donald W. Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadas Shankar, John Paul Shen, Clair Webb, Die Stacking (3D) Microarchitecture, 39th International Symposium on Microarchitecture (MICRO), December 2006
9. Yuan Xie, Gabriel H. Loh, Bryan Black, Kerry Bernstein, Design Space Exploration for 3D Architectures, ACM Journal of Emerging Technologies in Computing Systems (JETC), April 2006
10. Kiran Puttaswamy, Gabriel H. Loh, Thermal Analysis of a 3D Die-Stacked High-Performance Microprocessor, ACM/IEEE Great Lakes Symposium on VLSI (GLSVLSI), May 2006
11. Kiran Puttaswamy, Gabriel H. Loh, Dynamic Instruction Schedulers in a 3-Dimensional Integration Technology, ACM/IEEE Great Lakes Symposium on VLSI (GLSVLSI), May 2006
12. Kiran Puttaswamy, Gabriel H. Loh, The Impact of 3-Dimensional Integration on the Design of Arithmetic Units, IEEE International Symposium on Circuits and Systems (ISCAS), May 2006
13. Kiran Puttaswamy, Gabriel H. Loh, Implementing Register Files for High-Performance Microprocessors in a Die-Stacked (3D) Technology, IEEE International Symposium on VLSI (ISVLSI), March 2006
14. Kiran Puttaswamy, Gabriel H. Loh, Implementing Caches in a 3D Technology for High Performance Processors, International Conference on Computer Design (ICCD), October 2005



Multi-Core Resource Management

As the processor industry moves deeper down the multi-core path, the number of execution cores and threads per core will continue to increase. Many shared resources, however, such as the last-level cache (LLC), power budget, thermal budget, various bus/interconnect bandwidths, will not increase at the same rate (or at all). Many of these resources are already critical to performance, and naive and unmanagement sharing of these resources between multiple entities (cores) can lead to poor per-thread performance, low overall system throughput, degraded fairness and unattainable quality of service, and poor power-performance efficiency.

In this research project, we are exploring the management of these shared resources in future multi-core and multi-threaded processors. We are attacking this problem from multiple directions including hardware-only mechanisms, processor-OS cooperative techniques, and application-level analysis and characterization.

Funding for this project has been generously provided by the NSF.

Multi-Core Resource Management Papers:
(PDF pre-prints available on the Publications page.)

1. Yuejian Xie, Gabriel H. Loh, Dynamic Classification of Program Memory Behaviors in CMPs, 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (CMP-MSI, held in conjunction with ISCA-35), June 2008
2. Jonathan D. Kron, Brooks Prumo, Gabriel H. Loh, Double-DIP: Augmenting DIP with Adaptive Promotion Policies to Manage Shared L2 Caches, 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (CMP-MSI, held in conjunction with ISCA-35), June 2008
3. Rahul Garde, Samantika Subramaniam, Gabriel H. Loh, Deconstructing the Inefficacy of Global Cache Replacement Policies, 7th Workshop on Duplicating, Deconstructing, and Debunking (WDDD, held in conjunction with ISCA-35), June 2008
4. Gabriel H. Loh, The Cost of Uncore in Throughput-Oriented Many-Core Processors, Workshop on Architectures and Languages for Throughput Applications (ALTA, held in conjunction with ISCA-35), June 2008



High-Performance High-Efficiency Microarchitectures

Modern microprocessors achieve high performance through a variety of techniques, but common to most of these known methods is that this performance comes at the cost of additional overhead. Examples of overhead include pipeline latches and control, discarded instructions due to branch mispredictions, execution of dynamically dead instructions, and support for execution bandwidths that are far higher than the average utilization. This overhead manifests itself in the form of increased energy per instruction, increased power per instruction, and increased overall complexity. The impact of these overheads will multiply as processor designs include chip multiprocessing and arrays of smaller cores.

The research objective of this project is to identify and understand the nature of the overheads in a processor's performance enhancing mechanisms. From these learnings, we will invent new technologies to achieve similar performance benefits while substantially reducing the associated overhead. The results will be processor microarchitecture technologies that are much more efficient than conventional designs. This efficiency can be used to decrease the peak power, lower the peak temperature and increase the battery life of processors targeted for mobile markets. Alternatively, the efficiency gains can be traded for higher performance, or enable an overly aggressive design to be brought back within its power budget.

Equipment and funding for this project has been generously provided by Intel.

HPHE uArch Papers:
(PDF pre-prints available on the Publications page.)

1. Samantika Subramaniam, Anne C. Bracy, Hong Wang, Gabriel H. Loh, Criticality-Based Optimizations for Efficient Load Processing., 19th International Symposium on High-Performance Computer Architecture (HPCA), February 2009
2. Mauricio Breternitz Jr., Gabriel H. Loh, Bryan Black, Jeffrey Rupley, Peter G. Sassone, Wesley Attrot, Youfeng Wu, A Segmented Bloom Filter Algorithm for Efficient Predictors, 20th IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), October 2008
3. Samantika Subramaniam, Milos Prvulovic, Gabriel H. Loh, PEEP: Exploiting Predictability of Memory Dependences in SMT Processors, 14th International Symposium on High-Performance Computer Architecture (HPCA), February 2008
4. Gabriel H. Loh, Daniel A. Jimenez, Modulo Path History for the Reduction of Pipeline Overheads in Path-Based Neural Branch Predictors, Springer International Journal of Parallel Programming (IJPP), April 2008
5. Peter G. Sassone, Jeff Rupley, Edward Brekelbaum, Gabriel H. Loh, Bryan Black, Matrix Scheduler Reloaded, 34th International Symposium on Computer Architecture (ISCA), June 2007
6. Peter G. Sassone, D. Scott Wills, Gabriel H. Loh, Static Strands: Safely Exposing Dependence Chains for Increasing Embedded Power Efficiency, ACM Transactions on Embedded Computing Systems (TECS), September 2007
7. Samantika Subramaniam, Gabriel H. Loh, Fire-and-Forget: Load/Store Scheduling with No Store Queue at All, 39th International Symposium on Microarchitecture (MICRO), December 2006
8. Ranjith Subramanian, Yannis Smaragdakis, Gabriel H. Loh, Adaptive Caches: Effective Shaping of Cache Behavior to Workloads, 39th International Symposium on Microarchitecture (MICRO), December 2006
9. Chinnakrishnan Ballapuram, Kiran Puttaswamy, Gabriel H. Loh, Hsien-Hsin S. Lee, Entropy-based Low Power Data TLB Design, ACM/IEEE Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), October 2006
10. Daniel A. Jimenez, Gabriel H. Loh, Controlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors, 18th IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), October 2006
11. Gabriel H. Loh, Revisiting the Performance Impact of Branch Predictor Latencies, International Symposium on Performance Analysis of Software and Systems (ISPASS), March 2006
12. Samantika Subramaniam, Gabriel H. Loh, Store Vectors for Scalable Memory Dependence Prediction and Scheduling, 12th International Symposium on High-Performance Computer Architecture (HPCA), February 2006
13. Gabriel H. Loh, A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2005
14. Gabriel H. Loh, Daniel A. Jimenez, Reducing the Power and Complexity of Path-Based Neural Branch Prediction, 5th Workshop on Complexity Effective Design (WCED, held in conjunction with ISCA-32), June 2005
15. Peter G. Sassone, D. Scott Wills, Gabriel H. Loh, Static Strands: Safely Collapsing Dependence Chains for Increasing Embedded Power Efficiency, Conference on Languages, Compilers and Tools for Embedded Systems (LCTES), June 2005


Georgia Institute of Technology
College of Computing
Superscalar Technology INnovation Group, © 2008
Last modified 28 Oct '08