David A. Bader
IEEE Fellow
AAAS Fellow
Professor
College of Computing
Georgia Tech
Atlanta, GA 30332


 
 

 

FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine

The Fast Fourier Transform (FFT) is of primary importance and a fundamental kernel in many computationally intensive scientific applications. In this paper we investigate its performance on the Sony-Toshiba-IBM Cell Broadband Engine, a heterogeneous multicore chip architected for intensive gaming applications and high performance computing. The Cell processor consists of a traditional microprocessor (called the PPE) that controls eight SIMD co-processing units called synergistic processor elements (SPEs). We exploit the architectural features of the Cell processor to design an efficient parallel implementation of Fast Fourier Transform (FFT). While there have been several attempts to develop a fast implementation of FFT on the Cell, none have been able to achieve high performance for input series with several thousand complex points. We use an iterative out-of-place approach to design our parallel implementation of FFT with 1K to 16K complex input samples and attain a single precision performance of 18.6 GFLOP/s on the Cell. Our implementation beats FFTW on Cell by several GFLOP/s for these input sizes and outperforms Intel Duo Core (Woodcrest) for inputs of greater than 2K samples. To our knowledge we have the fastest FFT for this range of complex inputs.

Publication History

Versions of this paper appeared as:
  1. D.A. Bader and V. Agarwal, ``FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine (Abstract),'' Eleventh Annual High Performance Embedded Computing Workshop (HPEC), Lexington, MA, September 18-20, 2007.
  2. D.A. Bader and V. Agarwal, ``FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine,'' 14th IEEE International Conference on High Performance Computing (HiPC), Springer-Verlag LNCS 4873, 172-184, Goa, India, December 18-21, 2007.

Download this report in Adobe PDF


 
 

Last updated: October 25, 2007

 




Computational Biology



Parallel Computing



Combinatorics