Active Learning: Nina Balcan Shores Up Foundations of Her Field

Friday, January 14, 2011


At the intellectual crossroads of machine learning, algorithmic game theory and optimization, there are signposts asking a few foundational questions: What’s the best method for gathering and using available information? How should a system adapt to change? And what’s the best way to interact with a new environment?

Nina Balcan wants to provide answers to those questions.

An assistant professor in the School of Computer ScienceBalcan is a member of the school’s theory group, but don’t make the mistake of thinking her research struggles for practical application. Indeed, her work is driven by it.

“By providing solid foundations to many of the modern protocols in areas like machine learning, my research directly impacts many application areas, such as computational biology,” Balcan says.

Another example of an application area is spam detection, a field in which there are several learning protocols that can be employed to detect whether that new email is really from a Facebook friend or someone phishing for personal data. “Supervised learning” techniques can take a batch of labeled data—emails that are identified as spam or not spam—and learn enough characteristics to determine whether future emails are spam. But what about unlabeled data, or a mix of labeled and unlabeled, which must be handled through “semi-supervised learning?”

“Going through and labeling a bunch of emails as spam or not spam is time-consuming and inefficient,” says Balcan.“Semi-supervised learning can provide a huge advantage over supervised learning by using information from unlabeled data as well. Another potentially even better approach is ‘active learning,’ where the computer takes a lot of emails but asks the user only for labels of a few, specially chosen messages, which are selected using a computer algorithm.

“This can save the user a substantial amount of effort,” she says.“More generally, in my work I look at these questions from a foundational perspective, trying to develop an understanding of when the different kinds of learning protocols would help and why.”

Growing up in Romania, Balcan at a young age had to find a suitable method for her own learning, as her precocious math skills surpassed the resources at hand. She enrolled at the University of Bucharest in 1996, majoring in both computer science and mathematics. By 2002, with bachelor’s and master’s degrees in hand, she left for the United States to study computer science at Carnegie Mellon University under Avrim Blum. Six years later she not only had her Ph.D. but also CMU’s coveted Distinguished Dissertation Award for her dissertation, “New Theoretical Frameworks for Machine Learning.”

“Nina's work was highly recommended by Avrim Blum, who is one of the world's leading experts on machine learning and game theory,” says Distinguished Professor Santosh Vempala of the School of Computer Science, who sat on Balcan’s thesis committee. “She has amazing energy and the vision to choose her own research directions, which often turn out to be novel for the field.”

Not long after arriving at Georgia Tech, Balcan received an NSF CAREER Award to pursue her research. She is affiliated with Vempala’s Algorithms & Randomness Center and ThinkTank (ARC), which pulls together faculty from across Georgia Tech to propose algorithmic solutions to vexing scientific problems. Balcan is a perfect fit for ARC, given her intellectual home at the convergence of three different disciplines, which she pursues not only in her research but also her teaching.

In fall 2010 Balcan taught a course in machine learning, game theory and optimization. It included students from five different Georgia Tech schools, including Math, Industrial & Systems Engineering (ISyE) and Electrical & Computer Engineering  (as well as, of course, Computer Science). All that was required was a level of “mathematical maturity” on the students’ part, Balcan says.

“For example, ISyE students have a background in optimization, and they’re very curious to see how machine learning can benefit optimization,” Balcan says. “The focus was to show how insights in one area can inform other areas, but I made sure to cover the basics in each area.”

“Nina is really interdisciplinary in her record and her taste,” Vempala says. “She has collaborated with learning theorists, statisticians, signals and systems folks, game theorists, complexity theorists, biologists and others. Her work is interesting and relevant to a broad spectrum of disciplines while also developing fundamental theoretical computer science—qualities that make her a great asset to ARC.”