CSE6740/CS7641/ISYE6740: Machine Learning I
Fall 2012
Lecture Time
Tuesday and Thursday 1:35  2:55pm in Klaus 2447 (starting Aug 21st)
Course Description
Machine learning studies the question "how can we build computer programs that automatically improve their performance through experience?" This includes learning to perform many types of tasks based on many types of experience. For example, it includes robots learning to better navigate based on experience gained by roaming their environments, medical decision aids that learn to predict which therapies work best for which diseases based on data mining of historical health records, and speech recognition systems that lean to better understand your speech based on experience listening to you.
The course is designed to answer the most fundamental questions about machine learning: How can we conceptually organize the large collection of available methods? What are the most important methods to know about, and why? How can we answer the question 'is this method better than that one' using asymptotic theory? How can we answer the question 'is this method better than that one' for a specific dataset of interest? What can we say about the errors our method will make on future data? What's the 'right' objective function? What does it mean to be statistically rigorous? Should I be a Bayesian? What computer science ideas can make ML methods tractable on modern large or complex datasets? What are the open questions?
This course is designed to give PhD students a thorough grounding in the methods, theory, mathematics and algorithms needed to do research and applications in machine learning. The course covers topics from machine learning, classical statistics, data mining, Bayesian statistics and information theory. Students entering the class with a preexisting working knowledge of probability, statistics and algorithms will be at an advantage, but the class has been designed so that anyone with a strong numerate background can catch up and fully participate.
If a student is not prepared for a mathematically rigorous and intensive class of machine learning, I suggest you take: Data and Visual Analytics course in the Spring, CSE 6242.
Textbooks
 Pattern Recognition and Machine Learning, Chris Bishop
 The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie, Robert Tibshirani, Jerome Friedman.
 Machine Learning, Tom Mitchell
Grading
The requirements of this course consist of participating in lectures, final exam, 5 problem sets and a project. This is a PhD level class, and the most important thing for us is that by the end of this class students understand the basic methodologies in machine learning, and be able to use them to solve real problems of modest complexity. The grading breakdown is the following
 Homework (5 assignments, 60%)
 Final exam (20%)
 Final project (20%)
Late Homework Policty
You will be allowed 3 total late days without penalty for the entire semester. For instance, you may be late by 1 day on three different homeworks or late by 3 days on one homework. Each late day corresponds to 24 hours or part thereof. Once those days are used, you will be penalized according to the policy below:
 Homework is worth full credit at the beginning of class on the due date.
 It is worth half credit for the next 48 hours.
 It is worth zero credit after that.
Exams
The final exams will be open book and open notes. Computers will not be allowed.
People
Instructor: Le Song, Klaus 1340, Office Hours: Thursday 34pm
Guest Lecturer: TBD
TA Office Hours: Monday 34pm and Thursday 34pm
TA: Seungyeon Kim, Klaus 1305
TA: Tran Quoc Long, Klaus 1305
TA: Parikshit Ram , Klaus 1305
Class Assistant: Michael Terrell, Klaus 1321
Mailing List
Discussion forum: https://groups.google.com/d/forum/cse6740fall2012
Mailing list: cse6740fall2012@googlegroups.com
Email to contact TA: cse6740.fall2012@gmail.com
Syllabus and Schedule
Date  Lecture & Topics  Readings & Useful Links 
Handouts 

Tue 8/21 
Lecture 1: Introduction

Slides 

Introduction to Functional Approximation: Density Estimation (eg. Maximum likelihood principle, Overfitting, Bayesian versus Frequentist estimate), Classification Theory, Optimal Classifier, Nonparametric methods & Instancebased learning (eg. Bayesian decision rule, Bayes error, Parzen and nearest neignbor density estimation, Knearest neighbor (kNN) classifier)  
Thu 8/23 
Lecture 2: Classification Theory, Optimal Classifier

Slides 

Linear Function Learning: Naive Bayes classifiers, Linear Regression, Logistic regression, Discriminative v.Generative models  
Tue 8/28 
Lecture 3: Generative classifiers

 Slides 
Thu 8/30 
Lecture 4: Discriminative classifiers:

 Slides 
Tue 9/4 
Lecture 5: Discriminative classifiers:


Slides 
NonLinear Models and model selection: Decision trees, Neural networks, Support Vector Machines, Kernel Methods, Boosting  
Tue 9/6 
Lecture 6: Complex discriminative function learning


Slides 
Tue 9/11 
Lecture 7: Neural Networks


Slides 
Tue 9/13 
Lecture 8: Support Vector Machine


Slides 
Tue 9/18 
Lecture 9: Boosting


Slides 
Theory and Practice in Supervised Learning: Sample complexity, PAC learning, Error bounds, VCdimension, Marginbased bounds, Overfitting, Cross validation, Model selection  
Tue 9/20 
Lecture 10: Learning Theory I


Slides 
Tue 9/25 
Lecture 11: Learning Theory II


Slides 
Tue 9/27 
Lecture 12: Practical issues in supervised learning


Slides 
Unsupervised Learning and Structured Models: Kmeans, Expectation Maximization (EM) for training Mixture of Gaussians, HMMs: ForwardsBackwards, Viterbi  
Tue 10/2  Lecture 13: Introduction to Unsupervised Learning 

Slides 
Thu 10/4 
Lecture 14: Mixture model


Slides 
Tue 10/9 
Lecture 15: Hidden Markov Models


Slides 
Graphical Models: Representation, Inference, Learning, Message passing algorithm  
Tue 10/23 
Lecture 16: Directed Graphical Models

Slides 

Tue 10/23 
Lecture 17: Undirected Graphical Models

Slides 

Tue 10/30 
Lecture 18: Directed vs Undirected Graphical Models

Slides 

Thu 11/1 
Lecture 19: Inference in Graphical Models

Slides 

Tue 11/6 
Lecture 20: Parameter Learning in Graphical Models

Slides 

Exploratory Data Analysis, Dimensionality reduction (PCA, SVD), Feature extraction  
Thu 11/8 
Lecture 21: Graphtheoretic Methods for Clustering

Slides 

Tue 11/13 
Lecture 22: Dimensionality reduction I


Slides 
Thu 11/15 
Lecture 22: Dimensionality reduction II


Slides 
Learning to make decisions: Markov decision processes, Reinforcement learning 
Additional Materials:
Basic probability and statistics. lecture 1, lecture 2, notes
Multivariate Gaussians. lecture