- You could choose to do a small project (if you prefer the homework oriented grading scheme): this might involve conducting a small experiment or reading a couple of papers and presenting the main ideas. The end result should be a 3-5 page report, and a 10-15 minute presentation.
- Alternatively, you could choose to do a larger project (if you prefer the project oriented grading scheme): this might involve conducting a novel experiment, or thinking about a concrete open theoretical question, or thinking about how to formalize an interesting new topic, or trying to relate several problems. The end result should be a 10-15 page report, and a 40-45 minute presentation.

- M.F. Balcan and N. Harvey. Submodular Functions: Learnability, Structure, and Optimization. STOC 2011.
- M.F. Balcan, E. Blais, A. Blum, and L. Yang. Active Property Testing. FOCS 2012.
- M.F. Balcan, A. Blum, J. Hartline, and Y. Mansour. Mechanism Design via Machine Learning. FOCS 2005.
- A. Blum and Y. Mansour. Learning, Regret Minimization, and Equilibria. Book chapter in Algorithmic Game Theory, Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay Vazirani, eds.
- Y. Chen and J. Wortman Vaughan. Connections Between Markets and Learning. ACM SIGecom Exchanges 2010.

- M.F. Balcan, A. Blum, S. Fine, and Y. Mansour. Distributed Learning, Communication Complexity and Privacy. COLT 2012.
- H. Daume III, J. M. Phillips, A. Saha, and S. Venkatasubramanian. Distributed Learning, Communication Complexity and Privacy. ALT 2012.
- M.F. Balcan, S. Ehrlich, and Y. Liang. Distributed Clustering on Graphs. NIPS 2013.

- M.F. Balcan and A. Blum. A Discriminative Model for Semi-Supervised Learning. Journal of the ACM, 2010.
- M.F. Balcan, A. Blum, and Y. Mansour. Exploiting Ontology Structures and Unlabeled Data for Learning. ICML 2013.
- X. Zhu. Semi-Supervised Learning. Encyclopedia of Machine Learning.
- A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr., and T. M. Mitchell. Coupled Semi-Supervised Learning for Information Extraction. International Conference on Web Search and Data Mining (WSDM), 2010.
- L. Xu, M. White, and D. Schuurmans. Optimal Reverse Prediction. Twenty-Sixth International Conference on Machine Learning (ICML), 2009.
- X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. The 20th International Conference on Machine Learning (ICML) 2003.

- S. Dasgupta. Coarse sample complexity bounds for active learning. Advances in Neural Information Processing Systems (NIPS), 2005.
- M.F. Balcan, A. Beygelzimer, J. Langford. Agnostic active learning. JCSS 2009 (originally in ICML 2006).
- A. Beygelzimer, S. Dasgupta, and J. Langford. Importance-weighted active learning. ICML 2009.
- M.F. Balcan, S. Hanneke, and J. Wortman. The True Sample Complexity of Active Learning. Machine Learning Journal 2010.
- D. Hsu's PhD thesis Algorithms for active learning. UCSD 2010.
- V. Koltchinskii Rademacher Complexities and Bounding the Excess Risk in Active Learning. Journal of Machine Learning Research 2010.
- Y. Wiener and R. El-Yaniv. Agnostic Selective Classification. NIPS 2011.
- S. Hanneke Rates of Convergence in Active Learning. The Annals of Statistics 2011.
- N. Ailon, R. Begleiter, and E. Ezra. Active Active learning using smooth relative regret approximations with applications. COLT 2012.
- M.F. Balcan and S. Hanneke. Active Robust Interactive Learning. COLT 2012.
- M.F. Balcan and P. Long. Active Active and Passive Learning of Linear Separators under Log-concave Distributions. COLT 2013.
- See also the NIPS 2009 Workshop on Adaptive Sensing, Active, Learning and Experimental Design: Theory, Methods, and Applications.

- P. M. Long and R. A. Servedio. Learning large-margin halfspaces with more malicious noise. NIPS 2011.
- P. Awasthi, A. Blum, and O. Sheffet. Improved Guarantees for Agnostic Learning of Disjunctions. COLT 2010.
- A. Kalai, A. Klivans, Y. Mansour, and R. Servedio. Agnostically Learning Halfspaces. FOCS 2005.
- W. S. Lee, P. L. Bartlett, and R. C. Williamson. Efficient agnostic learning of neural networks with bounded fan-in. IEEE Trans Info Theory, 1996.
- M. Kearns and M. Li. Learning in the Presence of Malicious Errors. STOC 1988.

- P. Awasthi and M.F. Balcan. Center Based Clustering: A Foundational Perspective. Book Chapter in Handbook of Cluster Analysis (to appear).
- S. Vassilvitskii and S. Venkatasubramanian. New Developments In The Theory Of Clustering. Tutorial at KDD 2010.
- M.F. Balcan, C. Borgs, M. Braverman, J. Chayes, and S. Teng. Finding Endogenously Formed Communities. SODA 2013.
- D. Hsu and S. M. Kakade. Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions. ITCS 2013.
- R. Kannan, H. Salmasian, and S. Vempala. The Spectral Method for General Mixture Models. COLT 2005.
- D. Achlioptas and F. McSherry. On Spectral Learning of Mixtures of Distributions. COLT 2005.
- M.F. Balcan, A. Blum, and S. Vempala. A Discriminative Framework for Clustering via Similarity Functions. STOC 2008. See also full version.
- M.F. Balcan and P. Gupta. Robust Hierarchical Clustering. COLT 2010.
- See also the NIPS 2009 Workshop on Clustering: Science or Art? Towards Principled Approaches .

- A. Daniely, S. Sabato, and S. Shalev-Shwartz. Multiclass Learning Approaches: A Theoretical Comparison with Implications. NIPS 2012
- A. Daniely, S. Sabato, S. Ben-David, and S. Shalev-Shwartz. Multiclass Learnability and the ERM Principle. COLT 2011.

- P. L. Bartlett, M.I. Jordan, J.D. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 2006.
- T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization.
- I. Steinwart, How to compare different loss functions and their risks.

- I. Mukherjee and R. E. Schapire. A theory of multiclass boosting. Journal of Machine Learning Research 2013
- P. M. Long and R. A. Servedio. Random classification noise defeats all convex potential boosters. Machine Learning, 2010
- V. Feldman. Distribution-Specific Agnostic Boosting. Innovations in Computer Science (ICS), 2010.
- A. Tauman Kalai and R. Servedio. Boosting in the Presence of Noise . JCSS 2005.
- S. Dasgupta and P. M. Long. Boosting with diverse base classifiers. COLT 2003.

- M. Warmuth and S. V. N. Vishwanathan. Leaving the Span. COLT 2005.
- M. F. Balcan, A. Blum, and N. Srebro. A Theory of Learning with Similarity Functions. Machine Learning Journal, 2008.
- N. Srebro and S. Ben-David. Learning Bounds for Support Vector Machines with Learned Kernels. 19th Annual Conference on Learning Theory (COLT), 2006.
- G. Lanckriet, N. Cristianini, P. Bartlett, and Laurent El Ghaoui. Learning the Kernel Matrix with Semidefinite Programming, Journal of Machine Learning Research 2004.

**PAC-Bayes bounds, shell-bounds, other methods of obtaining
confidence bounds.** Some papers:

- D. McAllester. Simplified PAC-BAyesian Margin Bounds. COLT 2003.
- A. Ambroladze, E. Parrado-Hernandez, and J. Shawe-Taylor. Tighter PAC-Bayes Bounds. NIPS 2006.
- J. Langford and D. McAllester. Computable Shell Decomposition Bounds. COLT 2000.
- S. Mendelson and P. Philips. On the Importance of Small Coordinate Projection. Journal of Machine Learning Research 2004.

**Learning in Graphical Models (Bayes Nets)**