CS 4803DL/7643 Deep Learning

Spring 2019, TR 3:00 - 4:15 pm, Instr Center 111

Course Information

This is an exciting time to be studying (Deep) Machine Learning, or Representation Learning, or for lack of a better term, simply Deep Learning!

Deep Learning is rapidly emerging as one of the most successful and widely applicable set of techniques across a range of domains (vision, language, speech, reasoning, robotics, AI in general), leading to some pretty significant commercial success and exciting new directions that may previously have seemed out of reach.

This course will introduce students to the basics of Neural Networks (NNs) and expose them to some cutting-edge research. It is structured in modules (background, Convolutional NNs, Recurrent NNs, Deep Reinforcement Learning, Deep Structured Prediction). Modules will be presented via instructor lectures and reinforced with assignments that teach theoretical and practical aspects. The course will also include a project which will allow students to explore an area of Deep Learning that interests them in more depth.

Note: The material in this course is largely courtesy of Dhruv Batra, who teaches this course in the Fall semesters.

Instructor: Zsolt Kira (Office Hours: Monday 11:00am-12:30pm CCB 222)
Teaching Assistants: Min-Hung (Steve) Chen (Office Hours: Wednesdays 1:30-3:00pm)
Miao Liu (Office Hours: Thursdays 1-2:30pm)
Anishi Mehta (Office Hours: Tuesdays 10-11:30am)
Sreenivasan Angarai Chandrasekar (Office Hours: Fridays 10-11:30am)

TA office hours will be in CCB 345 and instructor office hours will be in CCB 222 unless otherwise noted on Piazza.
Class meets: Tuesday, Thursday 3:00 - 4:15 pm, Instructional Center 111
Piazza: piazza.com/gatech/spring2019/cs7643a (main channel for communication!)
Gradescope: gradescope.com/courses/35387 (main channel for grade submission/distribution)
Canvas: gatech.instructure.com/courses/41895
Contact: Please post your questions on Piazza (note you can send private messages too). If it is personal and you cannot post on Piazza, please email the instructor.

Assignments

Assignment 0: PS0 (math), HW0 (coding)

Assignment 1: PS1 (math), HW1 (coding)

Assignment 2: PS2 (math), HW2 (coding)

Assignment 3: PS3 (math), HW3 (coding)

Project Check-in: Webpage (including template)

Schedule

Date	Topic	Optional Reading
W1: Jan 8	Class Administrativia. PS0/HW0 out. PS0 (math) due (Jan 09). Slides (pdf)	LeCun et al., Nature '15 Shannon, 1956 DL book: Linear Algebra background DL book: Probability background DL book: ML Background
W1: Jan 10	Image Classification and k-NN. Slides, Supervised Learning notes, k-NN notes
W2: Jan 15	Linear Classifiers, Loss Functions (guest lecture by Peter Anderson). Slides
W2: Jan 17	Regularization, Neural Networks. Slides HW0 (coding) due (Jan 18).	DL book: Deep Feedforward Nets DL book: Regularization for DL
W3: Jan 22	Optimization, Computational Flow Graphs, and Backprop. Slides, Gradients notes, Backprop notes	DL book: Optimization for Training Deep Models
W3: Jan 24	Training Neural Networks 1. (guest lecture by Peter Anderson) Slides
W4: Jan 29	No class (weather concern). PS1/HW1 out
W4: Jan 31	Convolutional Neural Networks (CNNs). (guest lecture by Peter Anderson) Slides, Gradients notes	DL book: Convolutional Networks
W5: Feb 5	Convolutional Neural Networks (CNNs) Part 2. Slides	Matrix calculus for deep learning
W5: Feb 7	Convolutional Neural Networks (CNNs) Part 3. Slides Backprop in Conv Layers (notes)
W6: Feb 12	Finish CNNs Part 3 Forward mode vs Reverse mode Auto-diff. Slides	Automatic Differentiation Survey, Baydin et al.
W6: Feb 14	Convolutional Neural Networks (CNNs) Part 4 Modern CNN architectures. Segmentation and Detection CNNs (and Other Pixel-level Prediction); Different Architectures. Slides (update 02/14) PS1/HW1 due (Feb 18).
W7: Feb 19	(Continue) Modern CNN architectures. Segmentation and Detection CNNs (and Other Pixel-level Prediction); Different Architectures. Slides PS2/HW2 out.	Fully Convolutional Networks for Semantic Segmentation
W7: Feb 21	3D CNNs (PointNet, PointNet++, SPLATNet, etc) (guest lecture by Zhile Ren) Slides
W8: Feb 26	Visualizing CNNs. Slides	Methods for Interpreting and Understanding Deep Neural Networks Network Dissection: Quantifying Interpretability of Deep Visual Representations
W8: Feb 28	Group discussions. PS2/HW2 due (March 4).
W9: Mar 5	Training Neural Networks 2. Slides PS3/HW3 out.
W9: Mar 7	Recurrent Neural Networks. Proposal submission due. Slides Notes	DL Book: Sequential Modeling and Recurrent Neural Networks (RNNs)
W10: Mar 12	Recurrent Neural Networks 2 (LSTMs, RNNs + CNNs). Slides (updated)	Show and Tell: A Neural Image Caption Generator Show, Attend and Tell
W10: Mar 14	Language Models, Unsupervised Learning and Generative Modeling. PS3/HW3 due (March 19)
W11: Mar 19	Spring Break
W11: Mar 21	Spring Break
W12: Mar 26	Unsupervised learning (guest lecture by Yen-Chang Hsu) Slides (Updated)
W12: Mar 28	Generative Adversarial Networks (GANs) Slides	NIPS 2016 Tutorial: Generative Adversarial Networks
W13: Apr 2	Variational Autoencoders (VAEs). Slides	Tutorial on Variational Autoencoders
W13: Apr 4	Reinforcement Learning (RL) Background. Slides
W14: Apr 9	Deep RL continued. Slides
W14: Apr 11	Beyond Supervised Learning: Transfer learning, few-shot learning, and meta-learning. Project progress check-in report due (April 11th). Slides
W15: Apr 16	Finish few-shot learning Slides Incorporating Structure - Graph Convolutional Networks Slides	Graph Convolutional Networks Semi-Supervised Classification with Graph Convolutional Networks
W15: Apr 18	Fusion, fairness, and wrapping up Slides
W16: Apr 23	Poster presentations Final project webpage due April 30th

Grading

80% Assignments (4 homeworks and problem sets)
20% Final Project
5% (potential bonus) Class Participation

Late policy for deliverables

No penalties for medical reasons or emergencies. Please see GT Catalog for rules about contacting the office of the Dean of Students.
Every student has 7 free late days (7 x 24-hour chunks) for this course.
After all free late days are used up, penalty is 25% for each additional late day. Note that you cannot use late days for project submissions.

Prerequisites

CS 4803DL/7643 should not be your first exposure to machine learning. Ideally, you need:

Intro-level Machine Learning
- CS 3600 for the undergraduate section and CS 7641/ISYE 6740/CSE 6740 or equivalent for the graduate section.
Algorithms
- Dynamic programming, basic data structures, complexity (NP-hardness)
Calculus and Linear Algebra
- positive semi-definiteness, multivariate derivates (be prepared for lots and lots of gradients!)
Programming
- This is a demanding class in terms of programming skills.
- HWs will involve a mix of languages (Python, C++) and libraries (PyTorch).
- Your language of choice for project.
Ability to deal with abstract mathematical concepts

FAQs

The class is full. Can I still get in?

Sorry. The course admins in IC control this process. Please talk to them.
Can I audit this class or take it pass/fail?

No. Due to the large demand for this class, we will not be allowing audits or pass/fail. Letter grades only. This is to make sure students who want to take the class for credit can.
Can I simply sit in the class (no credits)?

In general, we welcome members of the Georgia Tech community (students, staff, and/or faculty) to sit-in. Out of courtesy, we would appreciate if you let us know beforehand (via email or in person). If the classroom is full, we would ask that you please allow registered students to attend.
I have a question. What is the best way to reach the course staff?

Registered students – your first point of contact is Piazza (so that other students may benefit from your questions and our answers). If you have a personal matter, email the instructor.

Project Details (20% of course grade)

The class project is meant for students to (1) gain experience implementing deep models and (2) try Deep Learning on problems that interest them. The amount of effort should be at the level of one assignment per group member (1-5 people per group). More will be expected from larger groups.

Note that higher quality projects will be expected from graduate students, and a topic proposal will be reviewed in the middle of the semester. There will also be a poster session for teams containing graduate students.

A webpage describing the project in a self-contained manner will be the sole deliverable. While it may link to external documents and code describing and supplementing the project, such resources may or may not be used to evaluate the project. The webpage should completely address all of the points in the rubrik described below.

Feel free to use this webpage template (zip file (coming soon)), hosted example) as a starting point. You do not need to follow the template, but be sure you clearly indicate how each of the sections in the rubrik below are addressed.

Submit the webpage to gradescope by uploading a zip file containing an index.html inside a project_webpage/ subdirectory (e.g. see the template). Every group member should submit this zip file and all group member names should be listed as authors on the webpage.

Rubrik (60 points)

We are not looking to see if you succeeded or failed at accomplishing what you set out to do. It’s ok if your results are not “good”. What matters is that you put in a reasonable effort, understand the project and how it related to Deep Learning in detail, and are able to clearly communicate that understanding.

A former DARPA director named George H. Heilmeier came up with a list of questions for evaluating research projects (https://www.darpa.mil/work-with-us/heilmeier-catechism). We’ve adapted that list for our rubrik.

Introduction / Background / Motivation:

(5 points) What did you try to do? What problem did you try to solve? Articulate your objectives using absolutely no jargon.
(5 points) How is it done today, and what are the limits of current practice?
(5 points) Who cares? If you are successful, what difference will it make?

Approach:

(10 points) What did you do exactly? How did you solve the problem? Why did you think it would be successful? Is anything new in your approach?
(5 points) What problems did you anticipate? What problems did you encounter? Did the very first thing you tried work?

Experiments and Results:

(10 points) How did you measure success? What experiments were used? What were the results, both quantitative and qualitative? Did you succeed? Did you fail? Why?

In addition, 20 more points will be distributed based on presentation quality and Deep Learning knowledge:

(5 points) Appropriate use of visual aids. Are the ideas presented with appropriate illustration? Is the problem effectively visualized? Is the approach visualized appropriately? Are the results presented clearly; are the important differences illustrated? Every section and idea does not need a visual aid, but the most interesting and complex parts of the project should be illustrated.
(5 points) Overall clarity. Is the presentation clear? Can a peer who has also taken Deep Learning understand all of the points addressed above? Is sufficient detail provided?
(10 points) Finally, points will be distributed based on your understanding of how your project relates to Deep Learning. Here are some questions to think about:
- What was the structure of your problem? How did the structure of your model reflect the structure of your problem?
- What parts of your model had learned parameters (e.g., convolution layers) and what parts did not (e.g., post-processing classifier probabilities into decisions)?
- What representations of input and output did the neural network expect? How was the data pre/post-processed?
- What was the loss function?
- Did the model overfit? How well did the approach generalize?
- What hyperparameters did the model have? How were they chosen? How did they affect performance? What optimizer was used?
- What Deep Learning framework did you use?
- What existing code or models did you start with and what did those starting points provide?
At least some of these questions and others should be relevant to your project and should be addressed in the webpage. You do not need to address all of them in full detail. Some may be irrelevant to your project and others may be standard and thus require only a brief mention. For example, it is sufficient to simply mention the cross-entropy loss was used and not provide a full description of what that is. Generally, provide enough detail that someone with an appropriate background (in both Deep Learning and your domain of choice) could replicate the main parts of your project somewhat accurately, probably missing a few less important details.

Book

Deep Learning, Ian Goodfellow, Aaron Courville, and Yoshua Bengio, MIT Press

Overviews

Note to people outside Georgia Tech

Feel free to use the slides and materials available online here. If you use our slides, an appropriate attribution is requested. Please email the instructor with any corrections or improvements.