Current Research

 
My group is starting a new NSF CDS&E project entitled "SuperSTaRLU: STacked, AcceleRated Algorithms for Sparse Linear Systems". This page will be updated with more information as we proceed through this 3 year project, but for the moment here is the official project summary:

Computing systems and associated software have long had to make trade-offs in terms of performance due to an imbalance between fast processors and their slower memory hierarchies. Newly released technologies for 3D stacked memories provide an opportunity to reduce this imbalance by providing higher memory bandwidths and novel ways for accessing memory. One of the most promising techniques for using 3D stacked memory involves "memory-centric" computation that moves computation as close as possible to main memory. However, there is little understanding of how to best use these new memory technologies in libraries and applications, even as this hardware is slated to be integrated into near-term exascale supercomputing systems. The goal of this project is to understand the techniques and approaches that are needed to fully utilize 3D stacked memories and to demonstrate a useful set of computational primitives that can serve as a template for accelerating large scientific codes with these new memory components.

This research considers the specific problem of how to use "memory-centric" processors effectively to implement sparse primitives as part of a library that supports a number of key scientific applications including radiation transport, fluid flow, and fusion simulations. This library, SuperLU_DIST, is a sparse direct solver library designed for distributed memory multicore systems that has previously been accelerated on both NVIDIA’s graphics co-processors and Intel’s Xeon Phi co-processor. While this prior work has thus far yielded promising speedups, it has also revealed critical and fundamentally algorithmic performance bottlenecks related to memory data transfers. This research program will investigate whether these bottlenecks may be mitigated by using emerging memory-centric co-processors. Such co-processors, which include Micron’s Hybrid Memory Cube (HMC) and High-Bandwidth Memory (HBM), combine 3-D stacked memories and FPGAs to provide lower latency, higher bandwidth data transfer, and support for near-memory data processing. The project will use high-level languages like OpenCL to take advantage of such technologies and will utilize a mix of algorithmic advances and software library development to improve application performance. Additionally, this work will lead to a new, open-source release of SuperLU called Super Stacked, Accelerated LU (SuperSTARLU), which will be made available to application developers and will be demonstrated on one of the next-generation systems with memory-centric co-processors, such as NERSC’s Cori.