General Information
Details
My research focuses on data-centric machine learning, emphasizing the often-overlooked aspects of data such as sourcing, selection, annotation, and validation, which critically impact the reliability and usability of ML systems. I combine theoretical and experimental methods to develop conceptual insights with practical relevance. Within data-centric machine learning, I am especially interested in: (1) Active Learning and Experimental Design (methods to select data to collect supervision), and (2) Statistical aspects of data algorithms and data-centric ML (e.g., noise, domains, concept shift).
My teaching interests are focused on machine learning, including introductory ML courses, theoretical ML courses, and data-centric ML. While I enjoy teaching technical concepts through definitions, results, and examples, I am most interested in providing students with a big picture understanding, which is challenging to gain from books and online resources. I find that adopting a growth mindset for myself is a necessary step to effective teaching, and I regularly solicit feedback on my methods in order to maximize student learning and engagement.
Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling
Kangping Hu, Stephen Mussmann
https://arxiv.org/abs/2510.09877
Sum Estimation via Vector Similarity Search
Stephen Mussmann, Mehul Smriti Raje, Kavya Tumkur, Oumayma Messoussi, Cyprien Hachem, Seby Jacob
https://arxiv.org/abs/2601.11765
VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building
Maureen Daum, Enhao Zhang, Dong He, Stephen Mussmann, Brandon Haynes, Ranjay Krishna, Magdalena Balazinska
VLDB 2024
https://www.vldb.org/pvldb/vol16/p4188-daum.pdf
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning
Jifan Zhang*, Yifang Chen*, Gregory Canal, Arnav Das, Gantavya Bhatt, Stephen Mussmann, Yinglun Zhu, Simon Shaolei Du, Kevin Jamieson, Robert D Nowak
DMLR 2024
https://arxiv.org/pdf/2306.09910
Constants Matter: The Performance Gains of Active Learning
Stephen Mussmann, Sanjoy Dasgupta
ICML, 2022
https://proceedings.mlr.press/v162/mussmann22a.html