Home
Research
Publications
Teaching
Codes & Data
ML Group
ML Seminar

CS 7641 Machine Learning

CSE/ISYE 6740 Computational Data Analysis

Fall 2014


Lecture Time

Tuesday and Thursday 1:35 - 2:55pm (starting Aug 19)

Lecture Location

Clough 152

Course Description

Machine learning studies the question "how can we build computer programs that automatically improve their performance through experience?" This includes learning to perform many types of tasks based on many types of experience. For example, it includes robots learning to better navigate based on experience gained by roaming their environments, medical decision aids that learn to predict which therapies work best for which diseases based on data mining of historical health records, and speech recognition systems that lean to better understand your speech based on experience listening to you.

The course is designed to answer the most fundamental questions about machine learning: How can we conceptually organize the large collection of available methods? What are the most important methods to know about, and why? How can we answer the question 'is this method better than that one' with some theoretical guidance or for a specific dataset of interest? What can we say about the errors our method will make on future data? What's the 'right' objective function? What does it mean to be statistically rigorous? Should I be a Bayesian? What computer science ideas can make ML methods tractable on modern large or complex datasets? What are the open questions?

This course is designed to give students a thorough grounding in the concepts, methods and algorithms needed to do research and applications in machine learning. The course covers topics from machine learning, classical statistics, data mining, Bayesian statistics and information theory. Students entering the class with a pre-existing working knowledge of probability, statistics, linear algebra and algorithms will be at an advantage.

If a student is not prepared for a mathematically rigorous and intensive class of machine learning, I suggest you take: Introductory Machine Learning (CS 4641) or Data and Visual Analytics (CSE 6242). If a student already has extensive experience in machine learning or have taken some online courses in machine learning, I suggest you take a more theory oriented class: Advanced Machine Learning (ML 8803) and Machine Learning Theory (CS 7545).

Textbooks

Grading

The requirements of this course consist of participating in lectures, midterm and final exams, 4 assignments. The most important thing for us is that by the end of this class students understand the basic methodologies in machine learning, and be able to use them to solve real problems of modest complexity. The grading breakdown is the following

Background test will be conducted in the second lecture which aims to check your basic knowledge of probability and statistics, linear algebra, and matlab. Participation credits will be distributed to guest lectures, seminar attendance (announced in future), and class feedback.

Homework Policy

Exams

The exams will be open book and open notes in class. No electronic devices will be allowed.

People

Instructor:
Le Song, Klaus 1340, Office Hours: Fri 4-5pm

Teaching Assistants:
Joonseok Lee, Bo Xie, Amir Hossein Afsharinejad, Shuang Li, Kaushik Patnaik
E-mail: cdaml2014@gmail.com

TA office hour:
Session 1 (Joonseok Lee & Shuang Li): Mon 2-3pm, KACB 1315
Session 2 (Bo Xie & Amir Afsharinejad & Kaushik Patnaik): Wed 2-3pm, KACB 1315

Guest Lecturers: Byron Boots, Jimeng Sun, Fuxin Li

Discussion forum

We encourage you to discuss on Piazza discussion forum here. Note that this is mainly used for peer-discussion among students. If you have a question to the instructor or TAs, please email them directly at cdaml2014@gmail.com.

Syllabus and Schedule

Date Lecture & Topics Readings & Useful Links
Introduction and Backgrounds
Tue 8/19
  • Introduction
  • Applications
  • Core Techniques
  • A Simple Example
  • Logistics
  • Slides
Thu 8/21
  • Background knowledge test
  • Basic probability and statistics
  • Linear algebra
  • Matlab
Tue 8/26
  • PRML 2.1 - 2.3
Thu 8/28
  • Basic probability and statics
  • Linear algebra
  • Recitations by TAs
  • Slides
  • PRML 2.1 - 2.3
Unsupervised Machine Learning Techniques (Data Exploration)
Tue 9/2
  • Clustering
  • K-means clustering
  • Hierarchical clustering
  • PRML 9.1, ESL 14.3
  • K-means clustering [Applet]
  • Hierarchical clustering [Applet]
Thu 9/4
  • Clustering nodes in graphs
  • ESL 14.5
  • On Spectral Clustering: Analysis and an algorithm [pdf]
  • Normalized Cuts and Image Segmentation [pdf]
Tue 9/9
  • Dimensionality reduction
  • Principal component analysis
  • Singular value decomposition
Thu 9/11
  • Dimensionality reduction for manifold data
Thu 9/16
  • Density Estimation
  • Histogram
  • Kernel density estimator
  • PRML 2.5.1, ESL 6.6.1
  • Sliverman book chapter [pdf]
  • Applet
Tue 9/18
  • Gaussian mixture model
  • Expectation-Maximization algorithm
Thu 9/23
  • Feature Selection
  • Abnormality detection
  • Novelty detection
Thu 9/25
  • Introduction to Convex Optimization
Supervised Machine Learning Techniques (Predictive Models)
Tue 9/30
  • Nearest neighbor classifier
  • Bayesian decision rule
  • Naive Bayes classifier
Thu 10/2
  • Maximum likelihood estimation
  • Discriminative classifiers
  • Logistic regression
Tue 10/7
  • Support vector machine
  • Convex optimization
*** Thu 10/9, Midterm Review ***
*** 10/11-10/14, Fall 2014 Student Recess ***
*** 10/16, Midterm Exam (in class, tentative) ***
Tue 10/21
  • Guest Lecture 2: Jimeng Sun
  • ML Applications to healthcare informatics
Thu 10/23
  • Guest Lecture 3: TBD
  • TBD
Thu 10/28
  • Neural networks
  • Backpropagation algorithm
  • PRML 5.1 - 5.3, ESL 11.1 - 11.8
  • Applet
Tue 10/30
  • Combining classifiers
  • Boosting
Thu 11/4
  • Ridge regression
  • Regularization
  • Probabilistic interpretation
  • PRML 3.1, 3.3, ESL 3.4
  • Applet
Tue 11/6
  • Overfitting
  • Bias and variance decomposition
  • Cross-validation
  • PRML 3.2, ESL 7.2 - 7.3, 7.10
Advanced topics (Complex Models)
Thu 11/11
  • Kernel Methods
  • PRML 6.1 - 6.3, ESL 6.1 - 6.9
Thu 11/13
  • Hidden Markov Models
  • PRML 13.1 - 13.2
Tue 11/18
  • Graphical Models
  • Markov Random Fields
  • Topic Modeling
Thu 11/20
  • Social Network Analysis
  • Information diffusion
Tue 11/25
  • Collaborative Filtering
  • Lecture and demo by Joonseok Lee
*** 11/27-11/28 Thanksgiving Break ***
*** 12/2 Class Review ***
*** 12/11 Final Exam (2:50 - 5:40pm, same class room) ***

Additional Materials:

Basic probability and statistics. notes1, notes2, notes3

Multivariate Gaussians. notes

Review of linear algebra by Zico Kolter. notes

Matrix Cookbook. notes

Matlab Python cheatsheet. notes

Andrew Moore's tutorials