Marat Dukhan

I am Ph.D. student in Computational Science and Engineering at the Georgia Tech's College of Computing. My research interests are in high-performance computing, data analysis, and their interaction.


Conference and Journal Publications (peer reviewed)

PyHPC 2015, November 15, 2015

Marat Dukhan "PeachPy meets Opcodes: Direct Machine Code Generation from Python"

International Journal of High-Performance Computing Applications, July 2, 2015

Edmond Chow, Xing Liu, Sanchit Misra, Marat Dukhan, Mikhail Smelyanskiy, Jeff R. Hammond, Yunfei Du, Xiang-Ke Liao, Pradeep Dubey "Scaling up Hartree–Fock calculations on Tianhe-2"

SPAA 2015, June 13-15, 2015

Oded Green, Marat Dukhan, Richard Vuduc "Branch-Avoiding Graph Algorithms"

HPSL 2015, February 7, 2015

Marat Dukhan, Robert Guthrie, Robertson Taylor, Richard Vuduc "Furious.js: a Model for Offloading Compute-Intensive JavaScript Applications"

IPDPS 2014, May 19-23, 2014

Jee Choi, Marat Dukhan, Xing Liu, Richard Vuduc "Algorithmic time, energy, and power on candidate HPC compute building blocks"

PyHPC 2013, November 18, 2013

Marat Dukhan "PeachPy: A Python Framework for Developing High-Performance Assembly Kernels"

PPAM 10, September 10, 2013

Marat Dukhan and Richard Vuduc "Methods for high-throughput computation of elementary functions"

SC 2012, November 10-16, 2012

William B. March, Kenneth Czechowski, Marat Dukhan, Thomas Benson, Dongryeol Lee, Andrew J. Connolly, Richard Vuduc, Edmond Chow, and Alexander G. Gray "Optimizing the Computation of N-Point Correlations on Large-Scale Astronomical Data."

Poster Presentations (peer reviewed)

Hot Chips 25, August 25–28, 2013

Marat Dukhan "What a Fast FPU Means for Algorithms: A Story of Vector Elementary Functions"

Other Presentations (not peer-reviewed)

ATIP 2015 workshop, November 16, 2015

Marat Dukhan, Nicolas Vasilache, Soumith Chintala, Richard Vuduc "FFT-based convolutional neural networks for wide-SIMD multi-core CPUs"

Download poster

BLIS Retreat 2015 workshop, September 28–29, 2015

Marat Dukhan "Porting BLIS micro-kernels to PeachPy"

View interactive presentation View presentation video

Atlanta Go Meetup, September 16, 2015

Marat Dukhan "Accelerating Data Processing in Go with SIMD Instructions"

View slides

BLIS Retreat 2014 workshop, September 25–26, 2014

Marat Dukhan "BLIS for the Web"

View interactive presentation

The 1st BLIS Retreat workshop, September 5–6, 2013

Marat Dukhan "Developing low-level assembly kernels with Peach-Py"


Spring 2014

Fall 2012

CSE 6230 — High Performance Computing: Tools and Applications

Assisted Prof. Richard Vuduc and fully covered three weeks of the class

Fall 2013

CSE 6230 — High Performance Computing: Tools and Applications

Assisted Prof. Richard Vuduc and fully covered four weeks of the class


Furious.js JavaScript Library

JavaScript library for hardware-accelerated scientific computing
  • JavaScript tensor library similar to NumPy/SciPy
  • Asynchronous computations without excessive callbacks
  • Transparent offload of computations to Portable Native Client, Web Worker, WebCL or cloud server (via Web Sockets)

PeachPy: Portable Efficient Assembly Code-generation in High-level Python

Python Framework for Automating Development of High-Performance Assembly Kernels
  • Supports automatic register allocation
  • Provides stack frame management, including re-aligning of stack frame as needed
  • Generates versions of a function for different calling conventions from the same source (e.g. functions for Microsoft x64 ABI and System V x86-64 ABI can be generated from the same source)
  • Allows to define constants in the place where they are used (just like in high-level languages)
  • Tracks of instruction extensions used in the function.
  • Can multiplex multiple instruction streams (helpful for software pipelining)

Yeppp! High-performance library

Provides a collection of low-level functions optimized for modern processors
  • Library functions have multiple implementations, optimized for different processor architectures
  • The optimal function is chosen at run-time depending on the processor
  • Detects processor microarchitecture and instruction set extensions
  • Provides portable access to CPU cycle counter and high-resolution system timer
  • Available for Windows, Linux, Mac OS X, and Android
  • Supports x86, x86-64, ARM, MIPS, and PowerPC architectures
  • C and C++-compatible header files, and bindings for FORTRAN, Java and .Net/Mono
  • BSD license

Yeppp! CPUID for Android

Shows detailed information about the mobile CPU:
  • CPU architecture (ARM, x86, or MIPS)
  • CPU vendor (e.g. ARM, Qualcomm, Intel, MIPS)
  • CPU microarchitecture (e.g. ARM11, Cortex-A9, Atom, XBurst)
  • Minimum and maximum frequency
  • Number of logical cores
  • Supported instruction set extensions (e.g. NEON, VFPv4, SSSE3, MIPS3D)
  • Size of level-1, level-2, and level-3 caches.


2011 – now Georgia Institute of Technology, College of Computing
Candidate for Ph.D. in Computational Science and Engineering
  • Research advisor: Richard Vuduc
2009 – 2011 New Economic School in Moscow
M.A. in Economics
  • Master's thesis: "Regime Switching Autoregression and Volatility Jumps"
2005 – 2009 Moscow Institute of Physics and Technology
B.Sc. in Applied Mathematics and Physics
  • Bachelor's thesis: "Entropy Coding Optimization for H.264 Video Codec"

Contact Information