David A. Bader
IEEE Fellow
AAAS Fellow
College of Computing
Georgia Tech
Atlanta, GA 30332



Lab Focus

A fundamental problem in high performance computing is to design high-level, architecture independent, algorithms that execute efficiently on general purpose parallel machines. The purpose of this project is to advance our understanding of the main factors required for designing practical parallel algorithms and to develop techniques and data sets for experimentally validating the results. As a byproduct, we are developing portable parallel programs and data sets for a number of specific important problems arising in combinatorial computing, computational biology and genomics, and image processing.

Computational Biology and Genomics
Clusters of SMPs
Combinatorial Computing
Image Processing & Remote Sensing


Computational Biology and Genomics

A phylogeny is an evolutionary tree reconstructed from its leaves (each of which represents a different species) by using DNA sequence or gene data and a plausible model of evolution. Phylogenies are crucial tools in understanding biological evolution and biological mechanisms and their reconstruction has been one of the main algorithmic problems in computational biology. Reconstruction of the evolutionary history of a group of organisms has changed the face of biology and is being used increasingly in drug discovery, epidemiology, and genetic engineering. Unfortunately, such reconstructions typically involve solving difficult optimization problems, so that even moderately large datasets can require months to years of computation. In addition, almost all evolutionary reconstructions presently assume that the historical pattern is one of strict divergence that can be represented by a binary tree. This assumption is frequently violated, especially by plants which often hybridize readily and thus produce networks of relationships.

Recent Publications

PASQUAL: Parallel Techniques for Next Generation Genome Sequence Assembly
Rec-DCM-Eigen: Reconstructing a Less Parsimonious but More Accurate Tree in Shorter Time
A Tile-based Parallel Viterbi Algorithm for Biological Sequence Alignment on GPU with CUDA
On Accelerating Iterative Algorithms with CUDA: A Case Study on Conditional Random Fields Training Algorithm for Biological Sequence Alignment
Simulating Individual-Based Models of Epidemics in Hierarchical Networks
GTfold: A Scalable Multicore Code for RNA Secondary Structure Prediction
On the Design of High-Performance Algorithms for Aligning Multiple Protein Sequences on Mesh-Based Multiprocessor Architectures
A Graph-Theoretic Analysis of the Human Protein-Interaction Network Using Multi-core Parallel Algorithms
ExactMP: An Efficient Parallel Exact Solver for Phylogenetic Tree Reconstruction Using Maximum Parsimony
BioPerf: A Benchmark Suite to Evaluate High-Performance Computer Architecture on Bioinformatics Applications
Incorporating life sciences applications in the architectural optimizations of next-generation petaflop-system
BioSPLASH: A sample workload for bioinformatics and computational biology for optimizing next-generation high-performance computer systems
High-Performance Algorithm Engineering for Large-Scale Graph Problems and Computational Biology
Computational Biology and High-Performance Computing
Fast Character Optimization in Parsimony Phylogeny Reconstruction
Industrial Applications of High-Performance Computing for Phylogeny Reconstruction
A Linear-Time Algorithm for Computing Inversion Distance Between Signed Permutations with an Experimental Study
High-Performance Algorithm Engineering for Computational Phylogeny
A New Implementation and Detailed Study of Breakpoint Analysis


Cluster of SMP's

The future of high-performance computing relies on the efficient use of clusters with symmetric multiprocessor nodes and scalable interconnection networks. Hardware benchmark results reveal awesome performance rates for each component; however, few applications on SMP clusters ever reach a fraction of these peak speeds. Our focus is to develop a hybrid, hierarchical methodology that is a much closer abstraction of the underlying machine and is ideally suited to SMP clusters. The current deployment of teraflops and the future development of petaflops systems will certainly require the exploitation of similar advanced programming models.

Recent Publications

Efficient Data Migration to Conserve Energy in Streaming Media Storage Systems
A Waterfall Model to Achieve Energy Efficient Task Mapping for Large Scale GPU Cluster
A Framework for Measuring Supercomputer Productivity
A Novel FDTD Application Featuring OpenMP-MPI Hybrid Parallelization
Generalized Block Shift Network for Clusters
Cluster Computing: Applications
High-Performance Algorithms and Applications for SMP Clusters
Broadcast on Clusters of SMPs with Optimal Concurrency
Design and Analysis of the Alliance / University of New Mexico Roadrunner Linux SMP SuperCluster
SIMPLE: A Methodology for Programming High Performance Algorithms on Clusters of Symmetric Multiprocessors (SMPs)


Combinatorial & Scientific Computing

We are addressing a number of basic combinatorial computations that are commonly needed in various large scale applications. One such computation is sorting, a problem that has been studied extensively in the literature because of its many important applications and because of its intrinsic theoretical significance. Our research has produced the fastest known high-level, practical algorithms for many combinatorial problems such as sorting, personalized communication, selection, list ranking, and data redistribution, on general purpose parallel machines.

Recent Publications

Detecting Insider Threats in a Real Corporate Database of Computer Usage Activity
Energy-Efficient Scheduling for Best-Effort Interactive Services to Achieve High Response Quality
Investigating Graph Algorithms in the BSP Model on the Cray XMT
Multithreaded Community Monitoring for Massive Streaming Graph Data
Task-based Parallel Breadth-First Search in Heterogeneous Environments
STINGER: High Performance Data Structure for Streaming Graphs
GraphCT: Multithreaded Algorithms for Massive Graph Analysis
Enhancing Cache Coherent Architectures with Access Patterns for Embedded Manycore Systems
A Fast Algorithm for Streaming Betweenness Centrality
GPU Merge Path -- A GPU Merging Algorithm
Scalable Multi-threaded Community Detection in Social Networks
Analysis of Streaming Social Networks and Graphs on Multicore Architectures
Tracking Structure of Streaming Social Networks
Parallel Community Detection for Massive Graphs
GPUMemSort: A High Performance Graphic Co-processors Sorting Algorithm for Large Scale In-Memory Data
Massive Social Network Analysis: Mining Twitter for Social Good
Scalable Graph Exploration on Multicore Processors
Massive Streaming Data Analytics: A Case Study with Clustering Coefficients
Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce cluster and a Highly Multithreaded System
Evaluating Cell/B.E Software Cache for ClustalW
Generalizing k-Betweenness Centrality Using Short Paths and a Parallel Multithreaded Implementation
A Partition-Merge based Cache-Conscious Parallel Sorting Algorithm for CMP with Shared Cache
Faster FAST: Multicore Acceleration of Streaming Financial Data
Computing Discrete Transforms on the Cell Broadband Engine
Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis
Understanding the Design Trade-offs among Current Multicore Systems for Numerical Computations
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets
An Efficient Transactional Memory Algorithm for Computing Minimum Spanning Forest of Sparse Graphs
A Prediction Based CMP Cache Migration Policy
On the Design of Fast Pseudo-Random Number Generators for the Cell Broadband Engine and an Application to Risk Analysis
Optimizing JPEG2000 Still Image Encoding on the Cell Broadband Engine
Financial Modeling on the Cell Broadband Engine
High Performance MPEG-2 Software Decoder on the Cell Broadband Engine
SNAP, Small-world Network Analysis and Partitioning: an open-source parallel graph framework for the exploration of large-scale networks
Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors
High Performance Combinatorial Algorithm Design on the Cell Broadband Engine Processor
Approximating Betweenness Centrality
FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine
DOSA: Design Optimizer for Scientific Applications
Advanced Shortest Paths Algorithms on a Massively-Multithreaded Architecture
SWARM: A Parallel Programming Framework for Multi-Core Processors
On the Design and Analysis of Irregular Algorithms on the Cell Processor: A case study on list ranking
Dynamic Load Balancing in Distributed Systems in the Presence of Delays: A Regeneration-Theory Approach
Parallel Shortest Path Algorithms for Solving Large-Scale Instances
Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2
Performance Analysis of Parallel Programs Via Message-Passing Graph Traversal
Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors
An Empirical Analysis of Parallel Random Permutation Algorithms on SMPs
A Cache-Aware Parallel Implementation of the Push-Relabel Network Flow Algorithm and Experimental Evaluation of the Gap Relabeling Heuristic
An Experimental Study of Parallel Biconnected Components Algorithms on Symmetric Multiprocessors (SMPs)
On the Architectural Requirements for Efficient Execution of Graph Algorithms
Parallel Algorithm Design for Branch and Bound
Designing Irregular Parallel Algorithms With Mutual Exclusion and Lock-free Protocols
A Parallel State Assignment Algorithm for Finite State Machines
The Euler Tour Technique and Parallel Rooted Spanning Tree
Fast Shared-Memory Algorithms for Computing the Minimum Spanning Forest of Sparse Graphs
A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs)
A New Parallel Algorithm for Planarity Testing
High-Performance Algorithm Engineering for Parallel Computation
Evaluating Arithmetic Expressions using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs)
Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture
An Improved Randomized Selection Algorithm With an Experimental Study
A Practical Parallel Algorithm for Cycle Detection in Partitioned Digraphs
A Randomized Parallel Sorting Algorithm With an Experimental Study
A New Deterministic Parallel Sorting Algorithm With an Experimental Evaluation
Parallel Algorithms for Personalized Communication and Sorting With an Experimental Study (Extended Abstract)
Practical Parallel Algorithms for Personalized Communication and Integer Sorting
Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection


Image Processing & Remote Sensing

Image processing applications are well-suited to high performance computing techniques for several reasons: uniform grid layouts, spatial locality, and highly regular kernels. Images used for analysis are produced from a variety of applications, for example, remote sensing, image understanding, face recognition, detection of surface defects in industrial manufacturing, military target recognition, etc. Some of the processing is low-level, such as image calibration or enhancement, while other analysis is intermediate- or high-level, such as segmenting an image into objects or regions and classifying each. We have developed a high performance suite of practical algorithms for image processing that seem to outperform all other known implementations on the same platforms.

Recent Publications

Kronos: A Java-Based Software System for the Processing and Retrieval of Large Scale AVHRR Data Sets
A Hierarchical Data Archiving and Processing System to Generate Custom Tailored Products from AVHRR Data
High Performance Computing Algorithms for Land Cover Dynamics Using Remote Sensing Data
Parallel Algorithms for Image Enhancement and Segmentation by Region Growing with an Experimental Study
Parallel Algorithms for Image Histogramming and Connected Components with an Experimental Study
Scalable Parallel Algorithms for Texture Synthesis and Compression



Last updated: June 3, 2013


Computational Biology

Parallel Computing