Codes & Data
ML Group
ML Seminar

CSE6740/CS7641/ISYE6740: Machine Learning I

Fall 2012

Lecture Time

Tuesday and Thursday 1:35 - 2:55pm in Klaus 2447 (starting Aug 21st)

Course Description

Machine learning studies the question "how can we build computer programs that automatically improve their performance through experience?" This includes learning to perform many types of tasks based on many types of experience. For example, it includes robots learning to better navigate based on experience gained by roaming their environments, medical decision aids that learn to predict which therapies work best for which diseases based on data mining of historical health records, and speech recognition systems that lean to better understand your speech based on experience listening to you.

The course is designed to answer the most fundamental questions about machine learning: How can we conceptually organize the large collection of available methods? What are the most important methods to know about, and why? How can we answer the question 'is this method better than that one' using asymptotic theory? How can we answer the question 'is this method better than that one' for a specific dataset of interest? What can we say about the errors our method will make on future data? What's the 'right' objective function? What does it mean to be statistically rigorous? Should I be a Bayesian? What computer science ideas can make ML methods tractable on modern large or complex datasets? What are the open questions?

This course is designed to give PhD students a thorough grounding in the methods, theory, mathematics and algorithms needed to do research and applications in machine learning. The course covers topics from machine learning, classical statistics, data mining, Bayesian statistics and information theory. Students entering the class with a pre-existing working knowledge of probability, statistics and algorithms will be at an advantage, but the class has been designed so that anyone with a strong numerate background can catch up and fully participate.

If a student is not prepared for a mathematically rigorous and intensive class of machine learning, I suggest you take: Data and Visual Analytics course in the Spring, CSE 6242.



The requirements of this course consist of participating in lectures, final exam, 5 problem sets and a project. This is a PhD level class, and the most important thing for us is that by the end of this class students understand the basic methodologies in machine learning, and be able to use them to solve real problems of modest complexity. The grading breakdown is the following

Late Homework Policty

You will be allowed 3 total late days without penalty for the entire semester. For instance, you may be late by 1 day on three different homeworks or late by 3 days on one homework. Each late day corresponds to 24 hours or part thereof. Once those days are used, you will be penalized according to the policy below:

You must turn in all of the 5 homeworks, even if for zero credit, in order to pass the course.


The final exams will be open book and open notes. Computers will not be allowed.


Instructor: Le Song, Klaus 1340, Office Hours: Thursday 3-4pm

Guest Lecturer: TBD

TA Office Hours: Monday 3-4pm and Thursday 3-4pm

TA: Seungyeon Kim, Klaus 1305

TA: Tran Quoc Long, Klaus 1305

TA: Parikshit Ram , Klaus 1305

Class Assistant: Michael Terrell, Klaus 1321

Mailing List

Discussion forum: https://groups.google.com/d/forum/cse6740fall2012

Mailing list: cse6740fall2012@googlegroups.com

Email to contact TA: cse6740.fall2012@gmail.com

Syllabus and Schedule

Date Lecture & Topics Readings & Useful Links
Tue 8/21 Lecture 1: Introduction
  • What is Machine Learning?
  • Applications of Machine Learning
  • Basic Machine Learning Models
  • Logistics

  • Science special issue on data science
  • Nature special issue on big data
  • Reading: Biship book: Chapter 1, 2
  • Andrew Moore's tutorials

  • Slides
    Introduction to Functional Approximation: Density Estimation (eg. Maximum likelihood principle, Overfitting, Bayesian versus Frequentist estimate), Classification Theory, Optimal Classifier, Nonparametric methods & Instance-based learning (eg. Bayesian decision rule, Bayes error, Parzen and nearest neignbor density estimation, K-nearest neighbor (kNN) classifier)
    Thu 8/23 Lecture 2: Classification Theory, Optimal Classifier
    • Nonparametric methods & Instance-based learning
      • Bayesian decision rule
      • Parzen and nearest neignbor density estimation
      • K-nearest neighbor (kNN) classifier

  • Biship book: Chapter 2.5
  • Fukunaga book (Introduction to Statistical Pattern Recognition):

  • Slides
    Linear Function Learning: Naive Bayes classifiers, Linear Regression, Logistic regression, Discriminative v.Generative models
    Tue 8/28 Lecture 3: Generative classifiers
    • Naive Bayes classifiers with discrete and continuous (Gaussian) features

  • Reading:

  • Slides
    Thu 8/30 Lecture 4: Discriminative classifiers:
    • Linear rregression
    • Its probabilistic interpretation
    • Applet

  • Reading:
    • Mitchell book chapter 8.3
    • Bishop book, Chapter 3

    Tue 9/4 Lecture 5: Discriminative classifiers:
    • Logistic regression
    • Relationship to Naive Bayes
    • Applet

  • Reading:

  • Slides
    Non-Linear Models and model selection: Decision trees, Neural networks, Support Vector Machines, Kernel Methods, Boosting
    Tue 9/6 Lecture 6: Complex discriminative function learning
    • Decision tree learning
    • Applet

  • Reading:
    • Mitchell book chapter 3 on decision trees
    • Bishop book, Section 1.6

    Tue 9/11 Lecture 7: Neural Networks
    • Non-linear regression, classifiers
    • Gradient descent
    • Discovered representations at hidden layers
    • Applet

  • Reading:
    • Bishop, Chap 5
    • Mitchell, Chap 4

    Tue 9/13 Lecture 8: Support Vector Machine
    • Duality
    • Kernel Methods
    • Convex Optimization

  • Reading:

  • Slides
    Tue 9/18 Lecture 9: Boosting
    • Combination of clsasifiers
    • Adaboost
    • Applet

  • Reading:

  • Slides
    Theory and Practice in Supervised Learning: Sample complexity, PAC learning, Error bounds, VC-dimension, Margin-based bounds, Overfitting, Cross validation, Model selection
    Tue 9/20 Lecture 10: Learning Theory I
    • Sample complexity
    • Hypothesis space
    • Version space
    • PAC learning theory[Applet]
    • Agnostic learning

  • Reading:
    • Mitchell book, Chap 7

    Tue 9/25 Lecture 11: Learning Theory II
    • VC dimension and Agnostic learning
    • Overfitting and PAC bounds
    • Structural Risk Minimization

  • Reading:
    • Mitchell book, Chap 7

    Tue 9/27 Lecture 12: Practical issues in supervised learning
    • Overfitting
    • Bias and variance decomposition
    • Cross-validation
    • Regularization

  • Reading:
    • Mitchell book, Chap 5&6
    • Bishop, Chap 1&2

    Unsupervised Learning and Structured Models: K-means, Expectation Maximization (EM) for training Mixture of Gaussians, HMMs: Forwards-Backwards, Viterbi
    Tue 10/2 Lecture 13: Introduction to Unsupervised Learning
  • Reading:
    • Bishop, Chap 9

    Thu 10/4 Lecture 14: Mixture model
    • Expectation-Maximization algorithm
    • [Applet]

  • Reading:
    • Bishop, Chap 9

    Tue 10/9 Lecture 15: Hidden Markov Models
    • Representation: discrete and continuous observations
    • Inference: the forward-backward algorithm
    • Learning: the Balm-Welsh (EM) algorithm

  • Reading:
    • Bishop, Chap 13

    Graphical Models: Representation, Inference, Learning, Message passing algorithm
    Tue 10/23 Lecture 16: Directed Graphical Models
    • Representation
    • Conditional Independence Structure

  • Reading:

  • Slides
    Tue 10/23 Lecture 17: Undirected Graphical Models
    • Representation
    • Conditional Independence Structure

  • Reading:

  • Slides
    Tue 10/30 Lecture 18: Directed vs Undirected Graphical Models
    • Representation
    • Moralization
    • Triangulation

  • Reading:

  • Slides
    Thu 11/1 Lecture 19: Inference in Graphical Models
    • Message Passing algorithm
    • Loopy belief propagation

  • Reading:

  • Slides
    Tue 11/6 Lecture 20: Parameter Learning in Graphical Models
    • Learning Bayes Networks
    • Learning Undirected Graphical Models
    • Maximum Likelihood Estimation

  • Reading:

  • Slides
    Exploratory Data Analysis, Dimensionality reduction (PCA, SVD), Feature extraction
    Thu 11/8 Lecture 21: Graph-theoretic Methods for Clustering
    • Normalized cut
    • Spectral clustering

  • Reading:
    • On Spectral Clustering: Analysis and an algorithm, Andrew Y. Ng, Michael Jordan, and Yair Weiss. In NIPS 14,, 2002. [pdf]
    • Normalized Cuts and Image Segmentation, Jianbo Shi and Jitendra Malik, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888-905, August 2000. [pdf]

    Tue 11/13 Lecture 22: Dimensionality reduction I
    • Principal component Analysis[Applet]
    • Singular value decomposition
    • Latent Semantic Indexing (LSI)

  • Reading:

  • Slides
    Thu 11/15 Lecture 22: Dimensionality reduction II
    • Probabilistic Latent Semantic Indexing
    • Topic models and natural language analysis using graphical models

  • Reading:

  • Slides
    Learning to make decisions: Markov decision processes, Reinforcement learning

    Additional Materials:

    Basic probability and statistics. lecture 1, lecture 2, notes

    Multivariate Gaussians. lecture