David A. Bader
IEEE Fellow
AAAS Fellow
Professor
College of Computing
Georgia Tech
Atlanta, GA 30332


 
 
 



Graduate Research Assistantships

Graduate Research Assistantship positions for Ph.D. students are available in the areas of parallel and multicore algorithms, high-performance computing, computational science and engineering, large-scale optimization problems, and in application area of computational biology and genomics. Several current projects are described below. Additional new projects are anticipated by the next Fall semester. Each research assistant will receive a competitive stipend plus paid tuition.

Applicants should complete an official application for graduate studies in either the "Computational Science and Engineering (College of Computing)" or the "Computer Science" graduate programs at Georgia Tech, and select the Computational Science and Engineering track.
APPLICATION DEADLINE: December 15.

For further information, please see Prof. David A. Bader's Laboratory at http://www.cc.gatech.edu/~bader

APPLICATION INSTRUCTIONS:

  1. The Georgia Tech graduate application is available online at http://www.gradadmiss.gatech.edu/
  2. On Page 1, question 14 (Program of Study), of the online application, click on "Search for Degree and Major", click on "Ph.D", and select "Computational Science and Engineering (College of Computing)" for Graduate Major, and select "GT-Atlanta" for the planned campus.
  3. One Page 4 (Georgia Tech computer science application), question 1, select "High Performance Computing" as your first choice area of interest.
  4. In your statement on Page 4, please include this sentence: "I wish to be considered for a Graduate Research Assistantship under the direction of Professor David A. Bader."
  5. Please email Prof. David A. Bader ( ) with your First and Last name once you have submitted your online application and received an Order ID.
Please note the following are a selection of active projects as of August 2010 in my research group, and new projects are regularly added:

PROJECT ECHELON: Extreme-scale Compute Hierarchies with Efficient Locality-Optimized Nodes

(Funded by DARPA Ubiquitious High Performance Computing (UHPC))

A team led by NVIDIA has been awarded a research grant of $25 million by the Defense Advanced Research Projects Agency (DARPA), the U.S. Defense Department's research and development arm, to address what the agency calls a "crisis in computing." The four-year research contract, awarded under DARPA's Ubiquitous High Performance Computing (UHPC) program, covers work to develop GPU technologies required to build the new class of exascale supercomputers which will be 1,000-times more powerful than today's fastest supercomputers. The team -- which also includes Cray Inc., Georgia Institute of Technology, Oak Ridge National Laboratory and five other top U.S. universities -- is being funded by DARPA to address the challenge that conventional computing architectures are reaching the practical limits of energy usage and will not meet the challenges of exascale computing. The research team plans to develop new software and hardware technology to dramatically increase computing performance, programmability and reliability.

PROJECT CHASM: Challenge Applications and Scalable Metrics (CHASM) for Ubiquitous High Performance Computing

(Funded by DARPA Ubiquitious High Performance Computing (UHPC))

Advanced computing is the backbone of the Department of Defense and of critical strategic importance to our nation's defense. All DoD sensors, platforms and missions depend heavily on computer systems. To meet the escalating demands for greater processing performance, it is imperative that future computer system designs be developed to support new generations of advanced DoD systems and enable new computing application code. Targeting this crucial need, the Defense Advanced Research Projects Agency (DARPA) has initiated the Ubiquitous High Performance Computing (UHPC) program to create an innovative, revolutionary new generation of computing systems that overcomes the limitations of current evolutionary approach. Georgia Institute of Technology was selected to lead CHASM, an Applications, Benchmarks and Metrics team, for evaluating the UHPC systems under development.

PROJECT STING: Graph Analytics for Streaming Data on Emerging Platforms

(Funded by DoD-sponsored Center for Adaptive Supercomputing Software for Multithreaded Architectures: CASS-MT)

The growth of graph-structured data sets is outpacing analysis tools rapidly. Social networks like Facebook are growing quickly, adding an average of 17 million users per month over the past year to a present total of 300 million users with 45 million messages posted per day. Communication systems like Twitter add 25 million messages per day with rich context linking messages, users, and topics. Even such “sedate” topics as protein analysis generate millions of updates per year. Each of these graphs already stress analysis tools for static, unchanging graphs; simply repeating static analysis is insufficient for current graph data. We are developing tools to analyze streaming, dynamic graph data. These tools require adapting static analysis algorithms and developing new dynamic algorithms. To implement these algorithms efficiently, we are evaluating data structures and programming techniques in emerging development platforms like X10 and on new multithreaded hardware.

PROJECT GTFOLD: Combinatorial and Computational Methods for the Analysis, Prediction, and Design of Viral RNA Structures

(Funded by NIH)

The Human Genome Project and related efforts have generated enormous amounts of raw biological sequence data. However, understanding how biological sequences encode structural and functional information remains a fundamental scientific challenge. In particular, understanding and manipulating the base pairing, or secondary structure, of single-stranded RNA sequences is crucial to advancing knowledge about diseases caused by RNA viruses. The prediction of the correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. We are developing and extending a new parallel multicore and scalable RNA structure prediction program called GTfold. GTfold is one to two orders of magnitude faster than the de facto standard programs and achieves comparable accuracy of prediction. GTfold now optimally folds 11 picornaviral RNA sequences ranging from 7100 to 8200 nucleotides in 8 minutes, compared with the two months required by a previous study. With the paradigm shift to multicore chips and parallelism, we must extend and optimize GTfold to continue gaining performance with each new generation of systems.

PROJECT PETA-APPS: Petascale Simulation for Understanding Whole-Genome Evolution

(Funded by NSF PetaApps program)

The advent of high-throughput sequencing and the consequent reduction in cost of sequencing have produced an explosion in the amount of genomic data of all types. Making biological sense of this genomic data requires high-performance computing methods and an evolutionary perspective, whether you are trying to understand how new functional genes arise, why genes are organized into chromosomes, how species are connected through the Tree of Life, or why arrangements are subject to change. We have developped GRAPPA over many years to be the most accurate method for genome rearrangement analysis. GRAPPA is a massively parallel, state-of-the-art, freely-available, open source phylogeny reconstruction code that reconstructs evolutionary histories from thousands of organelle genomes. To tackle the growing scale of available data, GRAPPA is being extended with new petascale algorithms to scale to million-way parallelism and handle multi-chromosome nuclear genomes. Developing and deploying GRAPPA for petascale data is an exciting opportunity for algorithm development with real-world impact. (See also the CSE feature.)

PROJECT GALAXY: Dynamically Scaling Parallel Execution for Cloud-based Bioinformatics

(Funded by NIH)

Increasingly inexpensive high-throughput DNA sequencing holds great promise for biomedical research, but delivering upon this promise is challenging. Biomedical researchers are not experts on compex computational platforms necessary to tackle the volumes of data. We address these problems by bringing together Galaxy, a system for making complex computational analysis accessible and reproducible, with “cloud computing”, an infrastructure model where computing resources are purchased on demand as needed, making it possible for investigators with no informatics expertise to perform data-intensive analysis using cloud resources. The Galaxy tool model and execution engine need extended to support dynamically scaled parallel execution available in cloud resources. We are defining abstractions and reusable components to ease integrating existing and future tools. The landscape of analysis tools for NGS data is changing rapidly along with the cloud resources available, so these components must adapt quickly as new tools and best practices emerge.

PROJECT DOSA: Design Optimization Frameworks for High-Productivity Computing

(Funded by NSF)

High-performance computing (HPC) systems are taking a revolutionary step forward with complex architectural designs that require application programmers and compiler writers to perform the challenging task of optimizing the computation in order to achieve high performance. Realizing the gap between processor and memory performance, several leading HPC vendors plan to incorporate into their next-generation systems innovative architectural features that alleviate this memory wall. These new architectural features include hardware accelerators (e.g., reconfigurable logic such as FPGAs, SIMD/vector processing units such as in IBM Cell, and graphics processing units (GPUs)), adaptable general-purpose processors, run-time performance advisors, capabilities for processing in the memory subsystem, and power optimizations. With these innovations, the multidimensional design space for optimizing applications is huge. Software must be sensitive to data layout, cache parameters, and data reuse, as well as dynamically changing resources, for highest performance. Our research goal is to design a dynamic application composition system that provides both a framework for optimizing computational science and engineering applications and their high-performance computing technologies and increased productivity.

PROJECT BURTON: Research Infrastructure for Multithreaded Computing Platforms

(Funded by NSF)

Computer scientists have long debated the merits of message-passing versus shared-memory architectures for parallel systems. Message passing with MPI on commodity (e.g. Linux) clusters dominates high-performance computing today and has a strong infrastructure to support development and research. The trend towards multicore processors changes the situation. The major processor developers all envision placing tens to hundreds of cores on a single die, each running multiple threads. To take advantage of this, the CS community must focus on how to develop efficient multithreaded programs in a globally addressable memory space. Multithreaded computing needs to grow a support infrastructure comparable to MPI quickly. As part of a community of diverse groups of researchers with extensive experience with shared-memory multithreading, we are developing the shared infrastructure needed for multicore, multithreaded research and development.

Future and on-going interests

  • High-performance computing on manycore and multicore archtectures
  • Rendering currenlty intractable problems feasible for researchers in bioinformatics, genomics, and other scientific areas through parallelism advanced algorithms
  • Exploring trade-offs in performance, energy efficiency, and productivity in heterogeneous system architectures
  • Processing massive volumes of streaming data to provide low-latency analytic results

 

 
 

 
 

Last updated: November 29, 2010

 




Computational Biology



Parallel Computing



Combinatorics