CS 7641 Machine Learning
CSE/ISYE 6740 Computational Data Analysis
Fall 2014
Lecture Time
Tuesday and Thursday 1:35  2:55pm (starting Aug 19)
Lecture Location
Clough 152
Course Description
Machine learning studies the question "how can we build computer programs that automatically improve their performance through experience?" This includes learning to perform many types of tasks based on many types of experience. For example, it includes robots learning to better navigate based on experience gained by roaming their environments, medical decision aids that learn to predict which therapies work best for which diseases based on data mining of historical health records, and speech recognition systems that lean to better understand your speech based on experience listening to you.
The course is designed to answer the most fundamental questions about machine learning: How can we conceptually organize the large collection of available methods? What are the most important methods to know about, and why? How can we answer the question 'is this method better than that one' with some theoretical guidance or for a specific dataset of interest? What can we say about the errors our method will make on future data? What's the 'right' objective function? What does it mean to be statistically rigorous? Should I be a Bayesian? What computer science ideas can make ML methods tractable on modern large or complex datasets? What are the open questions?
This course is designed to give students a thorough grounding in the concepts, methods and algorithms needed to do research and applications in machine learning. The course covers topics from machine learning, classical statistics, data mining, Bayesian statistics and information theory. Students entering the class with a preexisting working knowledge of probability, statistics, linear algebra and algorithms will be at an advantage.
If a student is not prepared for a mathematically rigorous and intensive class of machine learning, I suggest you take: Introductory Machine Learning (CS 4641) or Data and Visual Analytics (CSE 6242). If a student already has extensive experience in machine learning or have taken some online courses in machine learning, I suggest you take a more theory oriented class: Advanced Machine Learning (ML 8803) and Machine Learning Theory (CS 7545).
Textbooks
 Pattern Recognition and Machine Learning, Chris Bishop
 The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie, Robert Tibshirani, Jerome Friedman.
Grading
The requirements of this course consist of participating in lectures, midterm and final exams, 4 assignments. The most important thing for us is that by the end of this class students understand the basic methodologies in machine learning, and be able to use them to solve real problems of modest complexity. The grading breakdown is the following
 Homework (4 assignments, 50%)
 Midterm exam (20%)
 Final exam (20%)
 Background test (4%)
 Participation (6%)
 If you are getting a score below 40%, the class may be too difficult for you and you should consider taking it next time when you are better prepared. If you stay, you will lose the 4% credit.
 If you are getting a score between 4079% and you decide to take the class, you are required to attend the mandatory recitation session in order to get the 4% credit.
 If you are getting a score above 79%, then you will automatically get the 4% credit and you are not required to attend the recitation session.
Homework Policy
 Homework should be submitted before the deadline set in TSquare. It is worth zero credit after the deadline.
 No late submission will be accepted through email, and we do not guarantee replies for such emails.
 We strongly encourage to use LaTeX for your submission. We will give 10 extra credits for using LaTeX or wordprocessor typed submissions as we understand it takes longer time. Unreadable handwriting is subject to zero credit.
 Any kind of academic misconduct is subject to F grade as well as reporting to the Dean of students. All answers and codes should be prepared by yourself. If you refer to any material, it should be properly cited.
Exams
The exams will be open book and open notes in class. No electronic devices will be allowed.
People
Instructor:
Le Song, Klaus 1340, Office Hours: Fri 45pm
Teaching Assistants:
Joonseok Lee,
Bo Xie,
Amir Hossein Afsharinejad,
Shuang Li, Kaushik Patnaik
Email: cdaml2014@gmail.com
TA office hour:
Session 1 (Joonseok Lee & Shuang Li): Mon 23pm, KACB 1315
Session 2 (Bo Xie & Amir Afsharinejad & Kaushik Patnaik): Wed 23pm, KACB 1315
Guest Lecturers: Byron Boots, Jimeng Sun, Fuxin Li
Discussion forum
We encourage you to discuss on Piazza discussion forum here. Note that this is mainly used for peerdiscussion among students. If you have a question to the instructor or TAs, please email them directly at cdaml2014@gmail.com.
Syllabus and Schedule
Date  Lecture & Topics  Readings & Useful Links  

Introduction and Backgrounds  
Tue 8/19 


Thu 8/21 



Tue 8/26 



Thu 8/28 



Unsupervised Machine Learning Techniques (Data Exploration)  
Tue 9/2 


Thu 9/4 


Tue 9/9 


Thu 9/11 



Thu 9/16 


Tue 9/18 



Thu 9/23 


Thu 9/25 


Supervised Machine Learning Techniques (Predictive Models)  
Tue 9/30 



Thu 10/2 



Tue 10/7 



*** Thu 10/9, Midterm Review ***  
*** 10/1110/14, Fall 2014 Student Recess ***  
*** 10/16, Midterm Exam (in class, tentative) ***  
Tue 10/21 



Thu 10/23 



Thu 10/28 



Tue 10/30 



Thu 11/4 



Tue 11/6 



Advanced topics (Complex Models)  
Thu 11/11 



Thu 11/13 



Tue 11/18 


Thu 11/20 


Tue 11/25 


*** 11/2711/28 Thanksgiving Break ***  
*** 12/2 Class Review ***  
*** 12/11 Final Exam (2:50  5:40pm, same class room) *** 
Additional Materials:
Basic probability and statistics. notes1, notes2, notes3
Multivariate Gaussians. notes
Review of linear algebra by Zico Kolter. notes
Matrix Cookbook. notes
Matlab Python cheatsheet. notes