Srinivas Aluru's Home

PaCE - Software for Clustering DNA Sequences

Overview

PaCE is a powerful tool for clustering large collections of Expressed Sequence Tags (ESTs) based on sequence similarity. The software is designed to run on parallel computers and provides a significant increase in speed over existing software, thus facilitating fast and accurate clustering of the input sequences. The memory requirement also scales linearly with the size of the input, enabling this software to cluster significantly larger data sets than is possible with current serial software.

The parallelism offered by PaCE facilitates a fast clustering mechanism. As an illustration, the software clustered 168,200 Arabidopsis thaliana ESTs in 15 minutes and 420,694 rat ESTs in 47 minutes. Although PaCE was originally developed to cluster ESTs, it is generic enough to cluster other types of DNA sequences such as genomic sequences. PaCE has been successfully applied on maize genomic data (currently comprising more than 830,000 sequences) for the purpose of genome assembly. PaCE supports a set of parameters that provides the user with a fine control on the quality of the output clustering. This in combination with a high speed execution facilitates multiple runs of the PaCE software with different parameter settings, providing scientists with a tool to better analyze sequence data.

Potential Applications and Users

Bioinformatics research
Biotechnology
Pharmaceutical applications

Currently, the PaCE software is being used by various academic, non-profit and corporate organizations. View a list of current users.

Publications

Anantharaman Kalyanaraman, Srinivas Aluru, Volker Brendel, Suresh Kothari. Space and time efficient parallel algorithms and software for EST clustering. IEEE Transactions on Parallel and Distributed Systems, 14(12):1209-1221, 2003.
Anantharaman Kalyanaraman, Suresh Kothari, Volker Brendel, Srinivas Aluru. Efficient clustering of large EST data sets on parallel computers. Nucleic Acids Research, 31(11):2963-2964, 2003.

System Requirements

PaCE is implemented using C and MPI. It can execute on a parallel computer running a unix-based operating system and connected by an underlying network. PaCE has been tested successfully with the MPICH implementation of MPI on a linux-cluster connected by ethernet/myrinet. Although not tested by us, PaCE is also currently used on SMP nodes by some users.

Download

The software is currently available for licensing. If you are a U.S. Governmental agency or a non-profit educational institution, you may use this Product, royalty free, if you agree to the terms.

All other interested parties should contact licensing@iastate.edu or the contact listed below.

Contact

The Iowa State University Office of Intellectual Property & Technology Transfer at 310 Lab of Mechanics, Ames, IA 50011.

Contact:	Todd Headley
Tel:	(515) 294-4470
Fax:	(515) 294-0778
E-mail:	theadley@iastate.edu

Current information on other Iowa State University technologies is available on-line at http://www.techtransfer.iastate.edu/.

The Blue Gene Project

Handbook of Computational Molecular Biology