Home
Research
Publications
Teaching
Codes & Data

CSE6740/CS7641/ISYE6740

Machine Learning I: Computational Data Analysis

Fall 2013


Lecture Time

Tuesday and Thursday 1:35 - 2:55pm in Instructional Center 1234 (starting Aug 20)

Course Description

Machine learning studies the question "how can we build computer programs that automatically improve their performance through experience?" This includes learning to perform many types of tasks based on many types of experience. For example, it includes robots learning to better navigate based on experience gained by roaming their environments, medical decision aids that learn to predict which therapies work best for which diseases based on data mining of historical health records, and speech recognition systems that lean to better understand your speech based on experience listening to you.

The course is designed to answer the most fundamental questions about machine learning: How can we conceptually organize the large collection of available methods? What are the most important methods to know about, and why? How can we answer the question 'is this method better than that one' using asymptotic theory? How can we answer the question 'is this method better than that one' for a specific dataset of interest? What can we say about the errors our method will make on future data? What's the 'right' objective function? What does it mean to be statistically rigorous? Should I be a Bayesian? What computer science ideas can make ML methods tractable on modern large or complex datasets? What are the open questions?

This course is designed to give students a thorough grounding in the concepts, methods and algorithms needed to do research and applications in machine learning. The course covers topics from machine learning, classical statistics, data mining, Bayesian statistics and information theory. Students entering the class with a pre-existing working knowledge of probability, statistics and algorithms will be at an advantage.

If a student is not prepared for a mathematically rigorous and intensive class of machine learning, I suggest you take: Introductory Machine Learning (CS 4641) or Data and Visual Analytics (CSE 6242). If a student already has extensive experience in machine learning or have taken some online courses in machine learning, I suggest you take a more theory oriented class: Advanced Machine Learning (ML 8803) and Machine Learning Theory (CS 7545).

Textbooks

Grading

The requirements of this course consist of participating in lectures, midterm and final exams, 7 problem sets. The most important thing for us is that by the end of this class students understand the basic methodologies in machine learning, and be able to use them to solve real problems of modest complexity. The grading breakdown is the following

Advanced students can use a project (and project report) option to replace the two exam components. The project has to be mutually agreed between the student and the lecturer. Note that the project option is much more challenging than exams, but with more fun.

Late Homework Policy

No late in homeworks. It is worth zero credit after that. You must turn in all of the homeworks, even if for zero credit, in order to pass the course.

Exams

The exams will be open book and open notes. Internet usage will not be allowed.

People

Instructor: Le Song, Klaus 1340, Office Hours: The half hour right after each lecture

Guest Lecturer: Bistra Dilkina

TA Office Hours: TBD

TA: Joonseok Lee, Klaus 1315, 1--2pm Monday

TA: Zhen Wang, Klaus 1315, 1--2pm Monday

TA: Nan Du, Klaus 1315, 2--3pm Friday

TA contact email: mlcda2013@gmail.com

Class Assistant: Mimi Haley, Klaus 1321

Mailing List

Discussion forum: https://groups.google.com/d/forum/cse6740fall2013

Mailing list: cse6740fall2013@googlegroups.com

Syllabus and Schedule

Date Lecture & Topics Readings & Useful Links Handouts
Tue 8/20
  • Introduction
  • Applications
  • A simple example
  • Logistics
Slides
Unsupervised Machine Learning Techniques (Data Exploration)
Thu 8/22
  • Dimensionality reduction
  • Principal component analysis
  • Singular value decomposition
Slides
Codes
Tue 8/27
  • Nonlinear dimensionality reduction
Slides
Codes
Thu 8/29
  • Clustering
  • K-means clustering
  • Hierarchical clustering
Slides
Codes
Tue 9/3
  • Clustering nodes in graphs
  • On Spectral Clustering: Analysis and an algorithm [pdf]
  • Normalized Cuts and Image Segmentation [pdf]
Slides
Codes
Thu 9/5
  • Density Estimation
  • Histogram
  • Kernel density estimator
Slides
Codes
Tue 9/10
  • Gaussian mixture model
  • Expectation-Maximization algorithm
Slides
Codes
Thu 9/12
  • Novelty detection
  • Abnormality detection
Slides
Codes
Supervised Machine Learning Techniques (Predictive Models)
Tue 9/17
  • Bayesian decision rule
  • Generative classifier
  • Naive Bayes classifier
Slides
Codes
Thu 9/19
  • Maximum likelihood estimation
  • Discriminative classifiers
  • Logistic regression
Slides
Tue 9/24
  • Support vector machine
  • Convex optimization
Slides
Assignment 3
Thu 9/26
  • Decision tree
  • Information theory
  • Feature selection
Slides
Tue 10/1
  • Neural networks
  • Backpropagation algorithm
Slides
*** Thu 10/3, Midterm Review ***
Thu 10/8
  • Combine classifiers
  • Boosting
Slides
Assignment 4
Tue 10/10
  • Ridge regression
  • Regularization
  • Probabilistic interpretation
Slides
*** 10/12-10/15, Fall 2013 Student Recess ***
*** 10/17, Midterm Exam ***
Tue 10/22
  • Guest Lecture I
  • Computational sustainability
  • Maximum entropy estimation
Slides
Thu 10/24
  • Guest Lecture II
  • Network estimation
  • Submodular optimization
Slides
Tue 10/29
  • Overfitting
  • Bias and variance decomposition
  • Cross-validation
Slides
Advanced topics (Complex Models)
Thu 10/31
  • Kernel Methods I
Slides
Tue 11/5
  • Kernel Methods II
Slides
Assignment 5
Thu 11/7
  • Hidden Markov Models I
Slides
Tue 11/12
  • Hidden Markov Models II
Slides
Thu 11/14
  • Markov Random Fields I
Slides
Tue 11/19
  • Markov Random Fields II
Slides
Assignment 6
Thu 11/21
  • Conditional Random Fields
Slides
Tue 11/26
  • Reinforcement Learning
Slides
*** 11/28-11/29 Thanksgiving Break ***
*** 11/3 Class Review ***
*** 11/9 Final Exam ***

Additional Materials:

Basic probability and statistics. notes1, notes2, notes3

Multivariate Gaussians. notes

Review of linear algebra by Zico Kolter. notes

Matrix Cookbook. notes

Matlab Python cheatsheet. notes