Ph.D. Thesis Proposal: Ravi Ganti

Add to Calendar
Date:
November 7, 2012 2:00 pm - 3:30 pm
Location:
KACB 1315

Ph.D. Thesis Proposal Announcement
Title: New Formulations for Problems in Sequential Analysis

Ravi Ganti
Computational Science and Engineering
College of Computing
Georgia Institute of Technology

Date: November 7th, 2012
Time: 2:00-3:30 PM
Location: Klaus 1315

Committee:

  • Dr. Alexander Gray(Advisor, School of Computational Science and Engineering, Georgia Tech)
  • Dr. Maria-Florina Balcan (School of Computer Science, Georgia Tech)
  • Dr. Le Song (School of Computational Science and Engineering, Georgia Tech)
  • Dr. Alexander Rakhlin (Department of Statistics, University of Pennsylvania, The Wharton School)
  • Dr. Tong Zhang (Department of Statistics, Department of Statistics, Rutgers University)

Abstract:
In this thesis we propose to investigate two problems in sequential analysis, namely active learning and model selection.

Passive supervised learning setup assumes that a lot of labeled data is available that can be fed to the learning algorithm for a certain  learning task. Obtaining labeled data can be both time-consuming, and expensive. Hence an important problem is to learn well with lesser amounts of labeled data. The framework of active learning has shown promising results in being able to do as well as standard passive learning but while utilizing lesser number of training labels. In the active learning setup the learner is allowed to query points of its choice, and a labeling oracle then labels these chosen points.

In the first part, inspired by ideas from sequential analysis, we propose different algorithms for active learning (AL) in the pool based setting. Building upon our previous work, and utilizing ideas from the literature on multi-armed bandits we propose a new algorithm called VC-UPAL, which uses ideas inspired by exploration-exploitation in multi-armed bandits. Our hope is that this will allow us to prove better label complexity guarantees, and also provide good empirical performance. We next view AL from the lens of model aggregation. Doing this allows us to actively aggregate simple models to obtain a complex model. Towards this objective we propose a simple sequential model aggregation algorithm called MD-AMA.

On the opposite end of the spectrum are domains, such as astronomy, where plenty of labeled data is available but computational resources come at a premium. In the second part, we propose to investigate the problem of model selection under computational constraints. We propose a new framework called CAMS which takes into account not only the estimation, and approximation error, but also the cost of computing a model in a model class. Inspired by ideas from multi-armed bandits we propose a new algorithm called CAMS-Stream to choose a model which optimally trades-off both the statistical error of the obtained model and computational complexity of learning the model.

http://www.cc.gatech.edu/~gmravi/