CUDA Center of Excellence – Research Projects

Georgia Tech is engaged in a number of research, development and educational activities that leverage GPU computing. These activities span the full gamut: applications, software development tools, system software and architectures.

Projects currently under way in the CUDA Center of Excellence include:

NSF Keeneland Project – The Keeneland project is bringing large-scale heterogeneous computing with GPUs to the NSF open-science community. In the fall of 2010, Keeneland will deploy an initial delivery system with 360 FERMI GPUs, and later in 2012, Keeneland will deploy a final delivery system with nearly three times this capability.
Contact Jeffrey Vetter for more information.

GPU VSIPL – GPU VSIPL is an implementation of the Vector Signal Image Processing Library (VSIPL) API that makes use of CUDA GPUs to accelerate signal processing applications.  VSIPL provides a high level, portable programming interface to leverage GPUs and other accelerators without low level optimization expertise.
Contact dan [dot] campbell [at] gtri [dot] gatech [dot] edu (Dan Campbell) for more information.

Ocelot – Ocelot is a dynamic compilation framework designed to map the explicitly data parallel execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms. Ocelot includes a dynamic binary translator from Parallel Thread eXecution ISA (PTX) to many-core processors that leverages the open source Low Level Virtual Machine (LLVM) code generator to target x86 and other ISAs. The dynamic compiler is able to execute existing CUDA binaries without recompilation from source and supports switching between execution on an NVIDIA GPU and a many-core x86 CPU at runtime. It has been validated against over 130 applications taken from the CUDA SDK, the UIUC Parboil benchmarks, the Virginia Rodinia benchmarks, the GPU-VSIPL signal and image processing library, the Thrust library, and several domain specific applications.
Contact Sudhakar Yalamanchili for more information.

Harmony – Harmony is a runtime supported programming and execution model that provides: (1) semantics for intuitively managing parallelism, (2) dynamic mappings from compute intensive kernels to heterogeneous processor resources, and (3) online monitoring and performance optimization services for heterogeneous many core systems. The programming model is based on the identification of compute kernels, predicated kernel execution, and a managed shared address space. The execution model is based on dynamic detection and tracking of dependencies between compute kernels (enabled by the programming model), and a decoupling of kernel invocation by the application and kernel scheduling/execution on a core. The approach is inspired by solutions to instruction scheduling and management in out-of-order (OOO) superscalar processors, where these solutions are now adapted to schedule kernels on diverse cores. When integrated with Ocelot, the result is portable execution across a range of system configurations. Scalable performance is maintained via a two step solution - producer/consumer dependencies are first inferred for a window of compute kernels that have yet to execute and then used as constraints to a scheduler that attempts to minimize the execution time of the application while satisfying all dependencies.
Contact Sudhakar Yalamanchili for more information.

Red Fox –The goal of this project is to harness the cost and performance advantages of GPUs for data intensive computations in enterprise applications. Towards this end we are working with LogicBlox Inc., (LB) a company that specializes in enterprise class applications for decision automation, analytics, and planning. This joint project is developing a compilation and execution environment that integrates the front-end from LB with the Harmony and Ocelot execution environment. The LB front-end is based on Datalog - a declarative language originally developed as a query language for deductive databases. The LB toolset is applied to data intensive applications and currently executes on commodity clusters. The major components of Red Fox are the LB Datalog front-end, the Harmony run-time and the Ocelot dynamic compiler. The integration has driven the development of a kernel intermediate representation that could lay the foundation for the integration of other front-ends while the compilation chain will incorporate domain specific compiler and run-time optimizations. The first instantiation is focused on the implementation and optimization of Relational Algebra operators.
Contact Sudhakar Yalamanchili for more information.

CS 4225: Intro to High Performance Computing – The goal of this course is to provide an introduction to the algorithmic and software tools and techniques needed to implement effective, high-performing programs on modern parallel computing systems. The course emphasizes the recent history and current trends in parallel computer architectures and programming models (i.e., languages and libraries) for shared memory multicore/manycore architectures. Students are expected to complete several hands-on assignments.
Contact Rich Vuduc for more information.

Introduction to Parallel Computing –This is an ECE special topic course (ECE 4083), planned to be offered in Spring 2011 (it's in the final stage of getting approved by the ECE undergraduate committee). It will cover GPU architectures, CUDA programming, and multi-threaded algorithms for GPUs.
Contact bohong [at] gatech [dot] edu (Bo Hong) for more information.

CS 4803: DGC – This course will explore game console architectures, parallel processors and GPU architectures. The course will provide background knowledge of programming in a game console architecture to better understand the hardware. The course will cover architectures in Xbox360, playstation3, NVIDIA GPUs, Intel Larrabee architectures, and Nintendo DS (ARM processors). The course focuses on the microarchitecture level of both traditional architectures and game console architectures. The details of how processors will work through programming assignments will be covered. The first 4 lab assignments will cover GPGPU programming (programming using graphics processors) and the following 4 lab assignments will cover Nintendo DS Programming to better understand embedded processors.
Contact Hyesoon Kim for more information.