General Information
Details
My current research interest is in systems support and resource management for distributed machine learning frameworks and applications. Specifically, I am currently working on distributed systems and scheduling algorithms for soft-real time Machine Learning inference and co-scheduling ML inference and online training. This builds on the body of research and development at Carnegie Mellon modeling, designing, and developing abstractions, primitives, algorithms and systems for a general resource management framework with support for static and dynamic heterogeneity, hard and soft placement constraints, time-varying resource capacity guarantees, and combinatorial constraints in heterogeneous resource contexts. Cost- and latency-efficient resource management is fundamental to commoditizing Machine Learning.
I focus my teaching on Operating and Distributed Systems at the undergraduate level, and Systems for Machine Learning (SysML) at the graduate level. In both classes the emphasis is on the fundamental development of design principles, clearly defining success metrics, and tradeoff spaces.
[EuroSys26] Maya: Optimizing Deep Learning Training Workloads using GPU Runtime Emulation 
Srihas Yarlagadda*, Amey Agrawal*, Elton Pinto, Hakesh Darapaneni, Mitali Meratwal, Shivam Mittal, Pranavi Bajjuri, Srinivas Sridharan, Alexey Tumanov
To appear in Proc. of the 21st ACM European Conference on Computer Systems (EuroSys'26), April 2026.
[CCS25] VillainNet: Targeted Poisoning Attacks Against SuperNets Along the Accuracy-Latency Pareto Frontier 
D. Oygenblik, A. Vemulapalli, A. Agrawal, D. Sanyal, Alexey Tumanov, Brendan Saltaformaggio
In Proc. of the 32nd ACM Conference on Computer and Communications Security (ACM CCS 2025)
[OpSysRev25] EMPIRIC: Exploring Missing Pieces in KV Cache Compression for Reducing Computation, Storage, and Latency in Long-Context LLM Inference 
Payman Behnam, Yaosheng Fu, Ritchie Zhao, Po-An Tsai, Zhiding Yu, Alexey Tumanov
ACM SIGOPS Operating Systems Review, Volume 59, Issue 2, July 2025.
[OpSysRev25] Efficient LLM Inference via Chunked Prefills 
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee
ACM SIGOPS Operating Systems Review, Volume 59, Issue 2, July 2025.
[OpSysRev25] Toward Weight Sharing Paradigm for Efficient AI: Training and Inference Serving 
Payman Behnam, Alind Khare, Dhruv Garg, Alexey Tumanov
ACM SIGOPS Operating Systems Review, Volume 59, Issue 2, July 2025.
[ICML25] RocketKV: Accelerating Long-Context LLM Inference via Two-stage KV Cache Compression 
Payman Behnam*, Yaosheng Fu*, Ritchie Zhao, Po-An Tsai, Zhiding Yu, Alexey Tumanov
In Proc. of 42nd International Conference on Machine Learning (ICML'25), Vancouver, 2025.
[NSDI25] SuperServe: Fine-grained Inference Serving for Unpredictable Workloads 
Alind Khare, Dhruv Garg, Sukrit Kalra, Snigdha Grandhi, Ion Stoica, Alexey Tumanov
In Proc. of the 22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI'25), Philadelphia, USA, 2025.
[TMLR25] ∇QDARTS: Quantization as an Elastic Dimension to Differentiable Neural Architecture Search 
Payman Behnam*, Uday Kamal*, Sanjana Vijay Ganesh, Zhaoyi Li, Michael Andrew Jurado, Alind Khare, Igor Fedorov, Gaowen Liu, Alexey Tumanov
Transactions in Machine Learning Research (TMLR), 2025.
[EuroMLSys25] Client availability in Federated Learning: It matters! 
Dhruv Garg, Debopam Sanyal, Myungjin Lee, Alexey Tumanov, Ada Gavrilovska
In Proc. of the 5th Workshop on Machine Learning and Systems (EuroMLSys'25), co-located with EuroSys'25, Rotterdam, Netherlands, March 31, 2025.
[SoCC24] Inshrinkerator: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara Sarath Nookala, Vidushi Vashishth, Kexin Rong, Alexey Tumanov
Accepted to the 15th ACM Symposium on Cloud Computing (SoCC'24), Nov 2024.
[TMLR24] PLUM: Improving Inference Efficiency By Leveraging Repetition-Sparsity Trade-Off
Sachit Kuhar, Yash Jain, Alexey Tumanov
Transactions in Machine Learning Research (TMLR), 2024.
[ECCV24] DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
Aditya Annavajjala*, Alind Khare*, Animesh Agrawal, Igor Fedorov, Hugo Latapie, Myungjin Lee, Alexey Tumanov
In Proc. of 18th European Conference on Computer Vision (ECCV 2024), Milano, Italy, Oct 2024.
[ECCV24] SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference
Alind Khare, Animesh Agrawal, Aditya Annavajjala, Payman Behnam, Hugo Latapie, Myungjin Lee, Alexey Tumanov
In Proc. of 18th European Conference on Computer Vision (ECCV 2024), Milano, Italy, Oct 2024.
[OSDI24] Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav Gulavani, Alexey Tumanov, Ramachandran Ramjee
In Proc. of 18th Usenix Symposium on Operating Systems Design and Implementation (OSDI'24), Santa Clara, USA, July 2024.
[MLSYS24] Vidur: A Large-scale Simulation Framework for LLM Inference
Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee, Alexey Tumanov
In Proc. of 7th Annual Conference on Machine Learning Systems (MLSys'24), Santa Clara, USA, May 2024.
[IPDPS24] Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators
P. Behnam, U. Kamal, A. Shafiee, Alexey Tumanov, Saibal Mukhopadhyay
In Proc. of 38'th IEEE International Parallel and Distributed Processing Symposium (IPDPS), USA, 2024.
[ML4H23] TransEHR: Self-Supervised Transformer for Clinical Time Series Data
Yanbo Xu, Shangqing Xu, Manav Ramprasad, Alexey Tumanov, Chao Zhang
In Proc. of Machine Learning for Health (ML4H'23), Dec 10, 2023.
[IEEEMicro23] Hardware-Software Co-design for Real-time Latency-accuracy Navigation in TinyML Applications
Payman Behnam*, Jianming Tong*, Alind Khare, Yangyu Chen, Pranav Gadikar, Abhimanyu Bambhaniya, Tushar Krishna, Alexey Tumanov
IEEE Micro Special Issue on TinyML, 2023.
[arXiv23] Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Debopam Sanyal, Jui-Tse Hung, Manav Agrawal, Prahlad Jasti, Shahab Nikkhoo, Somesh Jha, Tianhao Wang, Sibin Mohan, Alexey Tumanov
arXiv, Aug 2023.
[arXiv23] DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara Sarath Nookala, Vidushi Vashishth, Kexin Rong, Alexey Tumanov
arXiv, Jun 2023.
[MLSYS23] SubGraph Stationary Hardware-Software Inference Co-design
Payman Behnam*, Jianming Tong*, Alind Khare, Yangyu Chen, Yue Pan, Pranav Gadikar, Abhimanyu Bambhaniya, Tushar Krishna, Alexey Tumanov
In Proc. of 6th Conference on Machine Learning and Systems (MLSys'23), Jun 2023.
[ODIW@MLSYS23] Signed-Binary Networks: Improving Efficiency of Binary Networks by Exploiting Sparsity
Sachit Kuhar, Alexey Tumanov, Judy Hoffman
In Proc. of 3rd On-Device Intelligence Workshop, Machine Learning and Systems (MLSys'23), Jun 2023.
[NeurIPS22] UnfoldML: A Cost-Aware 2-D Dynamic Prediction Pipeline for Multi-Stage Classification
Yanbo Xu, Alind Khare, Glenn Matlin, Monish Ramadoss, Rishi Kamaleswaran, Chao Zhang, Alexey Tumanov
In Proc. of 36'th Conference on Neural Information Processing Systems (NeurIPS), Nov 2022.
[SOCC22] ESCHER: Expressive Scheduling with Ephemeral Resources
Romil Bhardwaj, Alexey Tumanov, Stephanie Wang, Richard Liaw, Philipp Moritz, Robert Nishihara, Ion Stoica
In Proc. of 13th ACM Symposium on Cloud Computing (SoCC), Nov 2022.
[ICCD22] CoDG-ReRAM: An Algorithm-Hardware Co-design to Accelerate Semi-Structured GNNs on ReRAM
Yixuan Luo*, Payman Behnam*, Kiran Thorat, Zhuo Liu, Hongwu Peng, Shaoyi Huang, Shu Zhou, Omer Khan, Alexey Tumanov, Caiwen Ding, Tong Geng
In Proc. of 40'th IEEE International Conference on Computer Design (ICCD), Oct 23-26 2022.
[ACSMD@ISCA22] Enabling Real-time DNN Switching via Weight-Sharing
Jianming Tong, Yangyu Chen, Yue Pan, Abhimanyu Bambhaniya, Alind Khare, Taekyung Heo, Alexey Tumanov, Tushar Krishna
In Proc. of 2nd Architecture, Compiler, and System Support for Multi-model DNN Workloads Workshop at ISCA'22 (ACSMD@ISCA'22), June 2022.
[EuroPar22] Automatic Parallelization of Python Programs for Distributed Heterogeneous Computing
Jun Shirako, Akihiro Hayashi, Sri Raj Paul, Alexey Tumanov, Vivek Sarkar
In Proc. of 28th International European Conference on Parallel and Distributed Computing (EuroPar'22), Aug 2022.
[ICLR21] CompOFA – Compound Once-For-All Networks for Faster Multi-Platform Deployment
Manas Sahni, Shreya Varshini, Alind Khare, Alexey Tumanov
In Proc. of International Conference on Learning Representations (ICLR'21), May 4 2021.
[EuroSys21] Rubberband: Cloud-based Hyperparameter Tuning
R. Liaw, U. Misra, L. Dunlap, J. Gonzalez, I. Stoica, Alexey Tumanov, K. Kandasamy, R. Bhardwaj
In Proc. of EuroSys'21, Apr 26-29, 2021.
[SOCC20] InferLine: Latency-Aware Provisioning and Scaling for Prediction Serving Pipelines
Daniel Crankshaw, Gur-Eyal Sela, Xiangxi Mo, Corey Zumar, Ion Stoica, Joseph E. Gonzalez, Alexey Tumanov
In Proc. of Symposium on Cloud Computing (SoCC'20), Nov 2020.
[PVLDB20] Cloudburst: Stateful Functions-as-a-Service
Vikram Sreekanti, Chenggang Wu, Charles Lin, Johann Schleier-Smith, Joseph Gonzalez, Joseph Hellerstein, Alexey Tumanov
In Proc. of PVLDB, 13(11):2438-2452, July 2020.
[KDD20] HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units
Shenda Hong, Yanbo Xu, Alind Khare, Satria Priambada, Kevin Maher, Alaa Aljiffry, Jimeng Sun, Alexey Tumanov.
In Proc. of Knowledge Discovery and Data Mining (KDD'20), Aug 2020.