Spring, 2019
 

Mondays / Wednesdays, 3:00-4:15pm, Bunger Henry 380

Instructor:
Byron Boots
email: 'bboots' 'at' 'cc' 'dot' 'gatech' 'dot' 'edu'
office: College of Computing Building (CCB) 318
office hours: Mondays / Wednesdays 4:30-5:30 (after class, CCB 318)

Teaching Assistants:
Nolan Wagener
email: 'nolan' 'dot' 'wagener' at' 'gatech' 'dot' 'edu'
office hours: Tuesdays 10:00-11:00am and Fridays 2:00-3:00pm

Xinyan Yan
email: 'xinyan' 'dot' 'yan' at' 'gatech' 'dot' 'edu'
office hours: Tuesdays 10:00-11:00am and Fridays 2:00-3:00pm

A growing number of state-of-the-art systems including field robots, acrobatic aerial vehicles, walking robots, and the leading computer Go player rely upon machine learning techniques to make decisions. The machine learning problems in these domains represent a fundamental departure from traditional classification and regression problems. The learner must contend with: a) the effect of their own actions on the world; b) sequential decision making and credit assignment; and c) the tradeoffs between exploration and exploitation. In the past ten years, the understanding of these problems have developed dramatically. One key to the advance of learning methods has been a tight integration with optimization techniques, and we will focus on this throughout the course.

This course is directed to graduate students interested in developing adaptive software that interacts with the world. Although much of the material will be driven by applications within robotics, anyone interested in applications of learning to planning and control techniques or an interest in building complex adaptive systems is welcome.

Textbooks:

Additional readings will be posted in the schedule below.

Prerequesits
 

As an advanced course, familiarity with basic ideas from probability, machine learning, and decision making/control will all be helpful. As the course will be project driven, prototyping skills including C, C++, Python, and Matlab will also be important. Creative thought and enthusiasm are required.

Schedule
Date Due Topic Reading Material HW
01/07/19 Overview Georgia Tech Honor Code Sign Up for Piazza
01/09/19 No Class (Instructor Travel)
01/14/19 MDPs, Value Iteration Notes on Markov Decision Problems
MDP Slides -- Dan Klein
01/16/19 Value Iteration (continued) Notes on Markov Decision Problems
How to Design Good Tetris Players
Probabilistic Robotics, Chapter 14
HW1
01/21/19 No Class (Martin Luther King Day)
01/23/19 The Linear Quadratic Regulator Notes on Linear Quadratic Regulators
LQR Slides -- Pieter Abbeel
RL for Helicopter Flight
01/28/19 Tracking with LQRs, Iterative LQR Notes on Linear Quadratic Regulators
Sequential Compositions of Behaviors
Speeding Up Dynamic Programming
LQR Trees
01/30/19 Policy Iteration Notes on Policy Iteration
Policy Iteration Slides -- Dan Klein
02/04/19 Fitted Q-Iteration Notes on Approximate Dynamic Programming
Learning to Drive a Real Car
Generalization in RL
Stable Function Approximation
02/06/19 Approximate Policy Iteration Notes on Approximate Dynamic Programming
API Survey
02/11/19 No Class (Instructor Travel)
02/13/19 HW1 No Class (Instructor Travel)
02/18/19 TD Learning, SARSA, Q-Learning Notes on TD, Q-Learning
Sutton & Barto: Ch. 6
RL Slides -- Dan Klein
Deep Q-Learning
02/20/19 Brute Force Simulation-Based Policy Search: Cross Entropy, Nelder Mead Notes on Black Box Optimization
Nelder Mead -- Wikipedia
PEGASUS
CEM
Optimization Stories
HW2
02/25/19 Backpropagation Notes on Backpropagation
Deep Learning: Ch. 6
Blog Post on the Adjoint Method
Fluid Control
02/27/19 Policy Gradients Notes on Policy Gradients
Sutton & Barto: Ch. 13
REINFORCE
Policy Gradient Slides -- Levine
03/04/19 Natural Policy Gradient Notes on Policy Gradients
Natural Policy Gradient
Covariant Policy Search
Natural Actor Critic
Trust Region Policy Optimization
Actor Critic Slides -- Levine
03/06/19 Iterative Learning Control Notes on Iterative Learning Control
Using Inaccurate Models in RL
03/11/19 Response Surface Methods Notes on Response Surface Methods
Response Surface Methods
Properties of Gaussians
Gaussian Processes: Ch. 1 & Ch. 2
Automatic Gait Optimization
03/13/19 Project Proposal Midterm Review Midterm Study Guide
03/18/19 No Class (Spring Break)
03/20/19 No Class (Spring Break)
03/25/19 Midterm Exam
03/27/19 HW2 Deterministic Algorithms, A* Slides on A* Search
A* Search -- Hart et al.
D* Search -- Stentz
04/01/19 Randomized Motion Planners Slides on Randomized Motion Planners
Probabilistic Roadmaps -- Kavraki et al.
RRT -- LaValle
RRT* -- Karaman and Frazzoli
04/03/19 Dynamic Motion Primitives
04/08/19 Imitation Learning, Dataset Aggregation Notes on Imitation Learning
Slides on Dataset Aggregation
DAgger -- Ross et al.
Agnostic System ID -- Ross and Bagnell
AggreVaTeD -- Sun et al.
Agile Autonomous Driving -- Pan et al.
HW3
04/10/19 Thoughts on Machine Learning and Robotics
04/15/19 No Class (Instructor Travel)
04/17/19 No Class (Instructor Travel)
04/22/19 HW3 Inverse Optimal Control / Inverse RL Notes on Imitation Learning
Learning to Search -- Ratliff et al.
Maximum Entropy IOC -- Ziebart et al.
04/26/19
2:50pm-5:40pm
Final Report During Final Exam Timeslot: Project Demos/Presentations

Announcements and Resources will be posted via the Georgia Tech Canvas system.

Grading
 

Final grades will be based on course projects (30%), homework assignments (50%), the midterm (15%), and class participation (5%).

Typsetting your homework solutions in LaTex is strongly encouraged (you will receive 10 extra credit points). Unreadable handwriting is subject to zero credit.

Late homework policy: Assignments are due at the beginning of class on the day that they are due. You will be allowed 3 total late days without penalty for the entire semester. Please use these wisely, and plan ahead for conferences, travel, deadlines, etc. Once those days are used, you will be penalized according to the following policy:

  • Homework is worth full credit at the beginning of class on the due date.
  • It is worth half credit for the next 48 hours.
  • It is worth zero credit after that.

Collaboration on homework: This class abides by Georgia Tech Honor Code. Unless otherwise specified, homeworks will be done individually and each student must hand in their own assignment. It is acceptable, however, for students to collaborate in figuring out answers and helping each other understand the underlying concepts. When collaborating, the "whiteboard policy" is in effect: You may discuss assignments on a whiteboard, but, at the end of a discussion the whiteboard must be erased, and you must not transcribe or take with you anything that has been written on the board during your discussion. You must be able to reproduce the results solely on your own after any such discussion. Finally, you must write the names of the students you collaborated with on each homework.

Audit policy: If you wish to audit the course, you must either:

  • Do two homework assignments.
  • Do the course project

Disclaimer: I reserve the right to modify any of these plans as need be during the course of the class; however, I won't do anything capriciously, anything I do change won't be too drastic, and you'll be informed as far in advance as possible.

Projects
 

The course project is an opportunity for you to deeply explore one (or several) of the techniques covered in class and apply them to a robotics problem that is of interest to you. Since the projects require a substantial amount of work, you may form groups of up to three students. The research topic is up to you, as long as it makes use of adaptive control or RL methods.

Project proposals: Your proposal should be 1-3 pages, and it should introduce the problem you are trying to solve, the approach you will take, and also address the following questions:

  • What are some impacts of this research?
  • What is novel about the approach you are taking?
  • How do learning and/or probabilistic inference techniques play a key role?
  • What is your metric for success?
  • What are key technical issues you will have to confront? Are there any other big challenges?
  • What software or datasets will you use?
  • What is your timeline? Include specific targets for the progress report.

Note on current research: You may use your current research as a course project, as long as you explore a new area of the problem, and you cannot use previous results. Your proposal should clearly state what novel part you will be tackling in your course project.

Final presentations: You’ll present your findings to the class at the end of the semester. This will be a presentation:

  • No more than 5 minutes! There will be a hard cutoff.
  • No more than 4 slides, exluding title slide.
  • Every group member must speak.
  • You are welcome to use your own computer, but you must send me a copy of the slides in advance.
  • Don't "decorate" your slides with equations. If there is an equation, I expect you to explain every variable.
  • Don't read your slides / show lots of text. Brief, salient points.

Final Report: The final report will consist of one deliverable:

  1. Written report: This is the detailed report of your approach and findings. You should re-state the problem you are solving and your approach, and summarize your results. The report should be no longer than a NeurIPS paper in size (8 pages including figures and tables), but a shorter and more concrete report is preferred.
Acknowledgements
 

Assignments, lectures, and ideas on this syllabus are partially adapted from Drew Bagnell's course at Carnegie Mellon University. I would like to thank Drew for helpful discussions and access to his course materials.