background image
richard
vuduc
"Science is always wrong. It never solves a problem without
creating ten more." --George Bernard Shaw
My research in high-performance computing (HPC) systems addresses fun-
damental questions of how to tune, to analyze, and to debug software automatically for
complex and rapidly-evolving architectures. I am broadly interested in architecture, com-
pilers, statistical machine learning, and computational science.
Education
Jan '04
Ph.D., Computer Science, University of California, Berkeley.
May '97
B.S., with Honors, Computer Science, Cornell University.
Ph.D. Dissertation
title
Automatic performance tuning of sparse matrix kernels.
advisors
Profs. James Demmel and Katherine Yelick
summary
I developed an automated system to generate highly efficient, platform-adapted implementations
of sparse matrix operations ("kernels"), which are frequent bottlenecks in diverse applications in
scientific computing, economic modeling, and information retrieval. While these kernels typically
run at 10% or less of peak machine speed, my implementations, automatically tuned using empirical
modeling and search techniques, can achieve up to 31% of peak and run up to 4× faster. I have
implemented these ideas in Oski, an open-source library.
Research Experience
Nov '04–present
Postdoctoral Scholar, Lawrence Livermore National Lab, California.
Built customized, performance-enhancing transformations for U.S. Department of Energy appli-
cations. Developed an empirical tuning compiler framework within Rose, an open-source source-
to-source compiler for C and C++. Implemented JitterBug, a tool to aid MPI application
debugging (
best paper
at PADTAD/ISSTA 2006).
Jan–Oct '04
Postdoctoral Scholar, University of California, Berkeley.
Completed the open-source implementation of Oski, including an experimental distributed mem-
ory implementation based on the PETSc framework.
Jan '98–Dec '03
Graduate Student Researcher, University of California, Berkeley.
Conducted basic research on performance modeling and automatic tuning of sparse matrix kernels
(
4 best paper/presentation awards
).
Jun '94–Jun '97
Research Intern, Institute for Defense Analyses, Virginia.
Developed simulations for basic science applications in superfluorescence in gamma-ray lasers,
thermal and magnetic techniques for mine-detection, and frost formation.
Awards
2006
Best paper at Parallel and Distributed Testing and Debugging (PADTAD), at ISSTA
2004
Best paper at International Conference on Parallel Processing (ICPP)
2002
Finalist, Best student paper at Supercomputing (SC)
2002
Best student paper at Performance Optimization of High-Level Languages and Libraries
(POHLL) Workshop, at the International Conference on Supercomputing (ICS)
2002
Best student presentation at POHLL/ICS
College of Computing, Georgia Institute of Technology, Atlanta, Georgia, 30332,
USA
·
Tel +1 404.385.3355
·
richie at vuduc.org
·
vuduc.org
background image
2000
Best presentation at Feedback-directed Dynamic Optimization workshop, at MICRO
1998
Outstanding Graduate Student Instructor Award, U.C. Berkeley
1997
Cornell Tradition Fellowship, Cornell University
Teaching Experience
Sep–Dec '97
Teaching Assistant, University of California, Berkeley, Computer Science.
Received a campus-wide
Outstanding Graduate Student Instructor Award
.
Aug '96–May '97
Teaching Assistant, Cornell University, Dept. of Physics.
Taught introductory engineering physics courses.
Aug–Dec '95
Teaching Assistant, Cornell University, Dept. of Mathematics.
Taught first-year calculus and linear algebra courses.
Professional Experience
Jun '98–Dec '99
Chief Technology Officer, Snailgram Greetings, Inc., Oakland, California.
Implemented scalable back-end infrastructure for customized commercial card printing.
Jun–Aug '93
Software Engineer, Office of Naval Research, Arlington, Virginia.
Developed network administration tools.
Jun '91–Jun '92
Software Engineer (Intern), Privac, Inc., Falls Church, Virginia.
Built systems software for a novel supercomputer prototype.
Publications
(Refereed except where noted.)
-- Autotuning libraries --
[1]
Sam Williams, Lenny Oliker,
Richard Vuduc
, John Shalf, Katherine Yelick, and James
Demmel. "Optimization of sparse matrix-vector multiply on emerging multicore platforms."
In Proc. Supercomputing, Reno, NV, USA, November 2007. (accepted ).
[2]
Rajesh Nishtala,
Richard Vuduc
, James W. Demmel, and Katherine A. Yelick. "
When
cache blocking sparse matrix vector multiply works and why
." Applicable Algebra in Engi-
neering, Communication, and Computing: Special Issue on Computational Linear Algebra
and Sparse Matrix Computations
, March 2007.
[3]
Richard Vuduc
and Hyun-Jin Moon. "
Fast sparse matrix vector multiplication by exploit-
ing variable block structure
." In Proc. Int'l Conf. on High-Performance Computing and
Communications, LNCS 3726, pages 807–816, Sorrento, Italy, September 2005.
[4]
Richard Vuduc
, James W. Demmel, and Katherine A. Yelick. "
OSKI: A library of automat-
ically tuned sparse matrix kernels
." In Proc. SciDAC 2005, Journal of Physics: Conference
Series, San Francisco, CA, USA, June 2005. Institute of Physics Publishing.
[5]
James Demmel, Jack Dongarra, Victor Ekhout, Erika Fuentes, Antoine Petitet,
Richard
Vuduc
, R. Clint Whaley, and Katherine Yelick. "
Self-adapting linear algebra algorithms
and software
." In Proc. IEEE: Special Issue on Program Generation, Optimization, and
Adaptation, February 2005.
[6]
Eun-Jin Im, Katherine Yelick, and
Richard Vuduc
. "
SPARSITY: An optimization frame-
work for sparse matrix kernels
." Int'l J. of High Performance Computing Applications,
18(1):135–158, 2004.
[7]
Richard Vuduc
, Attila Gyulassy, James W. Demmel, and Katherine A. Yelick. "
Memory
hierarchy optimizations and bounds for sparse A
T
Ax
." In Proc. ICCS Workshop on Parallel
Linear Algebra, volume LNCS, Melbourne, Australia, June 2003. Springer.
[8]
Best
presentation
;
Best student
paper
Richard Vuduc
, Shoaib Kamil, Jen Hsu, Rajesh Nishtala, James W. Demmel, and Kather-
ine A. Yelick. "
Automatic performance tuning and analysis of sparse triangular solve
."
In ICS Workshop on Performance Optimization via High-Level Languages and Libraries
(POHLL)
, New York, USA, June 2002.
College of Computing, Georgia Institute of Technology, Atlanta, Georgia, 30332,
USA
·
Tel +1 404.385.3355
·
richie at vuduc.org
·
vuduc.org
background image
[9]
Richard Vuduc
and James Demmel. "
Code generators for automatic tuning of numerical
kernels: Experiences with FFTW
." In Proc. Workshop on Semantics, Application, and
Implementation of Code Generators (SAIG), volume 1924 of LNCS, Montreal, Canada,
September 2000. Springer-Verlag.
-- Compiler-based autotuning --
[10]
Qing Yi, Keith Seymour, Haihang You,
Richard Vuduc
, and Dan Quinlan.
"
POET:
Parameterized Optimizations for Empirical Tuning
." In IPDPS Workshop on Performance
Optimization of High-Level Languages and Libraries (POHLL), Long Beach, CA, USA,
March 2007.
[11]
Dan Quinlan, Markus Schordan,
Richard Vuduc
, and Qing Yi. "Annotating user-defined
abstractions for optimization." In IPDPS Workshop on Performance Optimization of High-
Level Languages and Libraries (POHLL)
, April 2006.
[12]
Yuan Zhao, Qing Yi, Ken Kennedy, Dan Quinlan, and
Richard Vuduc
. "Parameterizing
loop fusion for automated empirical tuning." Technical Report UCRL-TR-217808, Cen-
ter for Applied Scientific Computing, Lawrence Livermore National Laboratory, December
2005. (submitted ).
-- Analytical and statistical performance modeling --
[13]
Best paper
Benjamin C. Lee,
Richard Vuduc
, James Demmel, and Katherine Yelick. "
Performance
models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply
."
In Proc. Int'l Conf. on Parallel Processing, Montreal, Canada, August 2004.
[14]
Richard Vuduc
, James Demmel, and Jeff Bilmes.
"
Statistical models for empirical
search-based performance tuning
." Int'l J. of High Performance Computing Applications,
18(1):65–94, 2004.
[15]
Finalist,
Best student
paper
Richard Vuduc
, James W. Demmel, Katherine A. Yelick, Shoaib Kamil, Rajesh Nish-
tala, and Benjamin Lee. "
Performance optimizations and bounds for sparse matrix-vector
multiply
." In Proc. Supercomputing, Baltimore, MD, USA, November 2002.
[16]
Richard Vuduc
, James W. Demmel, and Jeff A. Bilmes. "Statistical models for automatic
performance tuning." In Proc. Int'l Conf. on Computational Science, volume 2073 of LNCS,
pages 117–126, San Francisco, CA, May 2001. Springer.
[17]
Best
presentation
Richard Vuduc
, James Demmel, and Jeff Bilmes. "Statistical modeling of feedback data in
an automatic tuning system." In MICRO-33: Third ACM Workshop on Feedback-Directed
Dynamic Optimization
, Monterey, CA, December 2000.
-- Testing and debugging --
[18]
Dan Quinlan,
Richard Vuduc
, and Ghassan Misherghi. "
Techniques for specifying bug
patterns
." In Proc. Parallel and Distributed Testing and Debugging (PADTAD), London,
England, July 2007. (to appear ).
[19]
Thomas Panas, Tom Epperly, Dan Quinlan, Andreas Sæbjørnsen, and
Richard Vuduc
.
"Communicating software architecture using a unified single-view visualization."
In
Proc. 12th Int'l Conf. on Engineering of Complex Computer Systems (ICECCS), Auck-
land, New Zealand, July 2007. (to appear ).
[20]
Thomas Panas, Dan Quinlan, and
Richard Vuduc
. "Analyzing and visualizing whole
program architectures." In Proc. Int'l Conf. on Software Engineering, 3rd Workshop on
Aerospace Software Engineering (AeroSE)
, Minneapolis, MN, USA, May 2007.
[21]
Thomas Panas, Dan Quinlan, and
Richard Vuduc
.
"
Tool support for inspecting the
code quality of HPC applications
." In Proc. Int'l Conf. on Software Engineering, 3rd
Workshop on Software Engineering for High-Performance Computing Applications (SE-
HPC)
, Minneapolis, MN, USA, May 2007.
[22]
Best paper
Richard Vuduc
, Martin Schulz, Dan Quinlan, and Bronis de Supinski. "
Improving dis-
tributed memory applications testing by message perturbation
." In Proc. Int'l Symp. on
Software Testing and Analysis (ISSTA), 4th Workshop on Parallel and Distributed Systems:
Testing and Debugging (PADTAD-IV)
, Portland, ME, USA, July 2006.
College of Computing, Georgia Institute of Technology, Atlanta, Georgia, 30332,
USA
·
Tel +1 404.385.3355
·
richie at vuduc.org
·
vuduc.org
background image
[23]
Dan Quinlan,
Richard Vuduc
, Thomas Panas, Jochen Härdtlein, and Andreas Sæbjørnsen.
"
Support for whole-program analysis and verification of the One-Definition Rule in C++
."
In Proc. Static Analysis Summit, Gaithersburg, MD, USA, June 2006. National Institute
of Standards and Technology Special Publication.
[24]
Dan Quinlan, Shmuel Ur, and
Richard Vuduc
.
"
An extensible open-source compiler
infrastructure for testing
." In Proc. IBM Haifa Verification Conf., volume LNCS 3875,
pages 116–133, Haifa, Israel, November 2005.
-- Additional papers --
[26]
Rajesh Nishtala,
Richard Vuduc
, James W. Demmel, and Katherine A. Yelick. "
Perfor-
mance modeling and analysis of cache blocking in sparse matrix-vector multiply
." Technical
Report UCB/CSD-04-1335, U.C. Berkeley, June 2004. (unrefereed ).
[27]
Benjamin C. Lee,
Richard Vuduc
, James W. Demmel, Katherine A. Yelick, Michael
de Lorimier, and Lue Zhong. "
Performance optimizations and bounds for sparse symmet-
ric matrix-multiple vector multiply
." Technical Report UCB/CSD-03-1297, U.C. Berkeley,
November 2003. (unrefereed ).
[28]
Richard Vuduc
, Attila Gyulassy, James W. Demmel, and Katherine A. Yelick. "
Memory
hierarchy optimizations and performance bounds for sparse A
T
Ax
." Technical Report
UCB/CSD-03-1232, U.C. Berkeley, February 2003. (unrefereed ).
[29]
Danyel Fisher, Kris Hildrum, Jason Hong, Mark Newman, Megan Thomas, and
Richard
Vuduc
. "
SWAMI: A framework for collaborative filtering algorithm development and eval-
uation
." In Proc. SIGIR, Athens, Greece, July 2000.
[30]
E. Jason Riedy and
Richard Vuduc
. "
Microbenchmarking the Tera MTA
." Technical
report, University of California, Berkeley, May 1999. (unpublished manuscript).
[31]
Bohdan Balko, Irvin Kay,
Richard Vuduc
, and John Neuberger. "
Recovery of superfluores-
cence in inhomogeneously broadened systems through rapid relaxation
." Physical Review
B, 55(12079), 1997.
[32]
Bohdan Balko, Irvin Kay, J.D. Silk, and
Richard Vuduc
. "Superfluorescence (sf) in the
presence of inhomogeneous broadening and relaxation." Hyperfine Interactions: Special
Issue on the Gamma-Ray Laser
, June 1997.
[33]
Bohdan Balko, Irvin Kay,
Richard Vuduc
, and John Neuberger. "An investigation of the
possible enhancement of nuclear superfluorescence." In Proc. Lasers '95, page 308, 1996.
Invited Talks, Lectures, and Tutorials
2006
Keynote, International Workshop on Automatic Performance Tuning (iWAPT), Tōkyō
Japan
2006
Kyōto
     University, Kyōto
   , Japan
2006
High-Performance Computing Seminar, Pomona College
2006
Bay Area Scientific Computing Day, Livermore, California
2005
University of Rome, "Tor Vegata," Italy
2005
Tutorial on "The Rose C/C++ source-to-source translator" at the Conference on Parallel
Architectures and Compilation Techniques (PACT)
Professional Activities
-- Societies --
Member
Society for Industrial and Applied Mathematics (SIAM)
Member
Association for Computing Machinery (ACM)
-- Refereeing --
2000
Reviewer, Principles of Programming Languages (POPL)
2001
Reviewer, International Conference on Computational Science (ICCS)
2001
Reviewer, Journal of Functional Programming (JFP)
College of Computing, Georgia Institute of Technology, Atlanta, Georgia, 30332,
USA
·
Tel +1 404.385.3355
·
richie at vuduc.org
·
vuduc.org
background image
2002
Reviewer, International Journal of High Performance Computing Applications (HPCA)
2003
Reviewer, Programming Language Design and Implementation (PLDI)
2003–4
Reviewer, Symposium on Parallel Algorithms and Architectures (SPAA 2003, 2004)
2004
Reviewer, Proceedings of the IEEE
2004
Reviewer, Parallel Processing Letters
2005
Reviewer, Combinatorial Scientific Computing
2005
Poster Committee, Supercomputing (SC)
2006
Program Committee, Performance Optimization of High-Level Languages and Libraries
(POHLL)
2006
Reviewer, Euro-Par (2006)
2006
Co-organizer, Mini-symposium on Adaptive Tools and Frameworks for High Performance
Numerical Computations, SIAM Parallel Processing
2006
Reviewer, Network and Parallel Computing (NPC)
2007
Program Committee, Workshop on Statistical and Machine learning applied to ARchitec-
ture and Compilation (SMART)
2007
Program Committee, POHLL
2007
Program Committee, Int'l. Workshop on Automatic Performance Tuning (iWAPT)
2007
Reviewer, International Conference on Supercomputing (ICS)
2007
Reviewer, SC
2007
Reviewer, Int'l Conference on High-Performance Computing and Communications (HPCC)
College of Computing, Georgia Institute of Technology, Atlanta, Georgia, 30332,
USA
·
Tel +1 404.385.3355
·
richie at vuduc.org
·
vuduc.org
background image
References
James W. Demmel
demmel@cs.berkeley.edu
Professor, Computer Science; Professor, Mathematics
737 Soda Hall
Computer Science Division
Dept. of Electrical Engineering and Computer Science
University of California, Berkeley
Berkeley, California 94720-1776
Tel: +1 510.643.5386
Katherine A. Yelick
yelick@cs.berkeley.edu
Professor, Computer Science
777 Soda Hall
Computer Science Division
Dept. of Electrical Engineering and Computer Science
University of California, Berkeley
Berkeley, California 94720-1776
Tel: +1 510.642.8900
Jeff A. Bilmes
bilmes@ee.washington.edu
Professor, Department of Electrical Engineering; Adjunct Professor, Department of Linguistics
University of Washington, Seattle
418 EE/CS Bldg, Box 352500
Seattle, Washington 98195-2500
Tel: +1 206.221.5236
Daniel Quinlan
dquinlan@llnl.gov
Research Scientist, Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
P.O. Box 808, L-550
Livermore, California 94551
Tel: +1 925.423.2668
College of Computing, Georgia Institute of Technology, Atlanta, Georgia, 30332,
USA
·
Tel +1 404.385.3355
·
richie at vuduc.org
·
vuduc.org