General Information
Details
My central research focus is on building interpretable, efficient, and performant statistical and computational machine learning algorithms to uncover, understand, and utilize the regular yet random latent phenomena underlying human-produced cultural artifacts including but not limited to natural language, architecture, philosophy, and knowledge. To conduct this interdisciplinary mode of inquiry and I have adopted a two-pronged approach that explores two distinct yet mutually beneficial research thrusts:
A) research on fundamental technical aspects of probabilistic generative models and their properties, and
B) developing scalable statistical methods for interpretable models for history and the humanities research.
The first thrust’s focus on machine learning (ML) fundamentals directly feeds into better algorithms for insightful analysis of human artifacts. Conversely, the focus of the second thrust presents novel and meaningful challenges for building explainable and trustworthy machine learning approaches.
I strongly believe that the biggest factor behind effective learning is the individual’s intrinsic motivation and joy of discovery. Therefore, I aim to create a learning environment such that my teaching goes beyond the classroom and excites the students about the subject. I aim to make the classroom experience rewarding by focusing on interactive discussion, doing extensive whiteboard work, and providing illuminating examples that build intuition behind complex topics.
Regarding research advising, I take special joy in discussing research with students and helping them become successful scientific collaborators. For graduate students, I focus on developing their skill to ask critical and impactful questions related to their topic of interest. With undergraduate students, I tend to be more task-oriented and detailed as they develop comfort handling the uncertainty associated with research.
I instruct and develop undergraduate and graduate NLP courses, a new course "ML for History and Humanities", and a VIP.
1) Aly Lidayan, Jakob Bjorner, Satvik Golechha, Kartik Goyal, Alane Suhr. ABBEL: LLM Agents through Belief Bottlenecks Expressed in Language. Preprint. September 2025. https://arxiv.org/abs/2512.20111
2) Nghia Le, Alan Ritter, Kartik Goyal. Semantic Differentiation for Tackling Challenges in Watermarking Low-Entropy Constrained Generation Outputs. Preprint. September 2025. https://www.arxiv.org/abs/2601.11629
3) Tomohiro Sawada, Kartik Goyal. Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models. Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2025. https://arxiv.org/abs/2508.06621
4) Yunxiang Yan, Tomohiro Sawada, Kartik Goyal. Cascaded Information Disclosure for Generalized Evaluation of Problem Solving Capabilities. Proceedings of Conference of Asia chapter of Association of Computational Linguists (AACL-IJCNLP), 2025. https://arxiv.org/abs/2507.23776
5) Davis Yoshida, Kartik Goyal, and Kevin Gimpel. MAP’s not dead yet: Uncovering true language model modes by conditioning away degeneracy. Proceedings of annual meeting of Association of Computational Linguists (ACL) 2024. https://aclanthology.org/2024.acl-long.855/ [Outstanding SAC paper award]
6) Caroline Craig, Kartik Goyal, Gregory Crane, Farnoosh Shamsian, and David A. Smith. Testing the Limits of Neural Sentence Alignment Models on Classical Greek and Latin Texts and Translations. https://ceur-ws.org/Vol-3558/paper6193.pdf Proceedings of Computational Humanities Research 2023.