Deep Learning for Perception
Georgia Tech, Spring 2016
Note: Dhruv Batra will be teaching this course in Fall 2017. I will be teaching Machine Learning CS 4641.
Course: CS 8803DL
Location: Instructional Center 111 (NOTE recent change!)
Day/Time: MWF 23pm
Office Hours: MWF 12:302pm CCB 260
TA: Daehyung Park, Ashwin Shenoi
Updates
April 20, 2016: Final project requirements up
here
March 10, 2016: Project midterm progress description up
here
February 2, 2016: Teams are out (see piazza). Note project proposals due on the week of the 15th (TBD, probably Friday). See
here for information about projects.
Jan 25, 2016: Reminder
writeups due Monday (01/25) night! See
project information and
Additional Readings for project ideas.
Jan 7, 2016: First readings are up. NOTE the change in classroom location to the Instructional Center.
Dec, 2015: The course is now full. If you would like a spot but haven’t gotten one, I recommend 1) Adding your name to the waitlist once Phase II registration starts, 2) Come to the first week of class. If there are students who drop you can then fill their spot.
Nov 1, 2015: The course is now open to M.S. students as well!
October 29, 2015: Note that the CoC has a new system for registration; currently you can only waitlist and students on that list will be given the opportunity to register on a firstcome, firstserve basis on Nov. 5th. See here and here for more information and instructions for joining the waitlist.
October 28, 2015: The course has been listed, and will be initially restricted to PhD students. Subsequently it will be opened to M.S. students. Please sign up as soon as possible if you are interested, as it fills up fast. For some idea about the previous version of the course, see here. Note that there will be adjustments/changes from the previous version.
Course Description and Goals
This course will cover deep learning and its applications to perception in many modalities, focusing on those relevant for robotics (images, videos, and audio). Deep learning is a subfield of machine learning that deals with learning hierarchical features representations in a datadriven manner, representing the input data in increasing levels of abstraction. This removes the need to handdesign features when applying machine learning to different modalities or problems. It has recently gained significant traction and media coverage due to its stateoftheart performance in tasks such as object detection in computer vision (see
ILSVRC2013 and
2014 as an example), terrain estimation for navigation in robotics, natural language processing, and others.
The course will cover the fundamental theory behind these techniques, with topics ranging from sparse coding/filtering, autoencoders, convolutional neural networks, and deep belief nets. We will cover both supervised and unsupervised variants of these algorithms, and motivate them by showing realworld examples in perceptionrelated tasks, including computer vision (object recognition/classification, activity recognition, etc.), perception for robotics (obstacle avoidance, grasping), and more. We will discuss some of the previous stateoftheart methods and how they relate to the deep learning algorithms that have recently replaced them. The principles will also be related to neuroscience and other fields to facilitate a discussion about what these new advancement mean for understanding intelligence more generally as well limitations and open problems. The course will involve a project where students will be able to take relevant research problems in their particular field, apply the techniques and principles learned in the course to develop an approach, and implement it to investigate how these techniques are applicable. Results of the project will be presented at the end of the semester to fellow students to convey what was learned, what results were achieved, and future open research areas
Learning Outcomes
This course has several learning outcomes. By the end of the course, students will be able to:

Describe the fundamental advancements made in deep learning in the past 5 years and explain why they have led to a small revolution in the field of machine learning and perception in multiple modalities

Describe how these techniques relate to previous methods that were state of the art, for example typical computer vision pipelines

Categorize, compare, and contrast various deep learning algorithms and explain which are better suited for particular types of realworld data or problems than others

Read and understand research papers in the area and be able to identify the key claims and ideas within them

Discuss how these techniques relate to our understanding of intelligence and other fields such as neuroscience that explore these topics as they occur in nature

Design and carry out a project within their area of interest, apply the learned techniques to new types of data within this area, and analyze the performance of the algorithms within it
Prerequisites
This is a graduate class. The course will cover advanced machine learning topics, as related to perception, so a prior introductory machine learning, artificial intelligence, computer vision, or pattern recognition course is recommended. Strong math skills especially linear algebra, will be essential to understanding many of these techniques. Since deep learning is new (or rather, has become mainstream recently), there is no book available yet; so, the course readings will be tutorials and conference/journal papers, many of which are from machine learning conferences such as NIPS/ICML. Being able to read, analyze, and hopefully critique such research papers is therefore crucial!
Course Material
There is no textbook currently published, but we may refer to the inprogress book by
Bengio et al.. The rest of the course material will be publicly available publications, or when necessary distributed material for tutorials that are not publicly accessible.
WriteUps for Reading Material
In some weeks, there will be a quiz on the readings on Friday. Other weeks will require a writeup of the reading materials. There will be a link to a set of reading questions next to that week’s readings.
Software
Many of the tutorials and assignments will use
Torch so you should set up the corresponding coding environment for LUA and go through the tutorials. There are a lot of public tutorials/resources for this framework, and we will also hold a lab. We will also provide a
Docker instance which is a lightweight virtual machine with everything set up. This is what will be used to grade assignments. If you wish to use this resource, I recommend setting docker up on your machine.
Schedule
Below is the
tentative schedule of topics that will be covered. Note that there are optional
additional readings if you would like to explore a particular topic in more depth.
Week

Topic

Readings

Slides

Assignments

1
(01/11)

Introduction and Background:
Machine Learning, Features, and Nonlinear mappings

[1] (Ch. 12, 5)
[2] [3]




Previous state of the art: Features, Sparse Coding, Pooling, and Classification
ML Background, Neural Networks


[03 Unsupervised Feature Learning]

Quiz Friday

2
(01/18)

NOTE: No class Monday due to Holiday!
Neural Networks (Cont.)

[1] (Ch. 6) Feedforward Deep Networks
[4]




Friday: SNOW DAY




3
(01/25)

Monday: Torch Lab
Backpropagation, Theory and Practice
(NonLinear Optimization, Stochastic/MiniBatch Gradient Descent)

[1] (Ch. 7) Regularization,
[7]


Writeup due Monday via Tsquare (Week 2 reading)
Assignment 1 Released

4
(02/01)

Sparse Autoencoders (SAE)
Convolutional Neural Networks (CNNs) and early applications

[1] (Ch 8.1,8.2 Optimization, 14 Autoencoders)
[8]


Quiz Monday on Week 3 Readings

5
(02/08)

Architectures and Tasks (Classification, Detection, Segmentation)

[1] (Ch. 9 Convolutional Networks)
[10]
Optional: [11]


Assignment 1 Due Feb. 9
Updated:Feb 10
Quiz Friday Monday

6
(02/15)

Visualization/Deconvolutional Networks, Pooling and invariance
Recurrent Neural Networks (RNNs)
Friday: Project Proposal Presentations

[15] [16] [17]

[12 Visualization]

Writeup due Monday via Tsquare (Week 6 reading)

7
(02/22)

Recurrent Neural Networks (RNNs) and their optimization

[1] 10.1  10.5 (Sequence Modeling)



8
(02/29)

Echo State Networks and RNN Applications
HessianFree Optimization

[1] 10.6  10.15
[20] [21]
Optional: [22]


Quiz Monday

9
(03/07)

Applications (CNN + RNN): Recent 2014 Successes: Descriptor Matching, Stereobased Obstacle Avoidance for Robotics, Captioning, Encoder/Decoder
Deep learning for language: Word/sentence vectors, parsing, sentiment analysis, etc.

[23]
[24]
[25]


No quiz/writeup

10
(03/14)

Language continued, Attention Mechanisms
Other architectures: Highway Networks, etc.

[26]
[27]
[28]



11
(03/21)

SPRING BREAK




12
(03/28)

MidTerm Project Status




13
(04/04)

Background: Probabilistic Graphical Models
Hopfield Nets, Boltzmann machines, Restricted Boltzmann Machines
Hopfield Networks, (Restricted) Bolzmann Machines

[1] Ch. 16 & 17

[22 Restricted Boltzmann Machines ]


14
(04/11)

Deep Belief Nets, Stacked RBMs
Applications to NLP , Pose and Activity Recognition in Videos
Recent Advances

[1] Ch. 20



15
(04/18)

Studentrequested topics, such as Reinforcement Learning/Atari, LargeScale Learning, Neural Turing Machines
Project Poster Presentations

[4042]


Poster presentations due (04/22 noon)
Assignment 4 due (04/24)

16
(04/25)

During Reading Week:
Monday: Project winner presentations, wrapup



Final Project Report Due (05/02)

References

“Deep Learning”, Yoshua Bengio and Ian J. Goodfellow and Aaron Courville, Book in preparation for MIT Press.

Emergence of simplecell receptive field properties by learning a sparse code for natural images. B.A. Olshausen, and D.J. Field. Nature, 1996.

Efficient sparse coding algorithms.H. Lee, A. Battle, R. Raina, and A. Y. Ng. NIPS, 2007.

Linear Classification, from Stanford’s CS231n Convolutional Neural Networks for Visual Recognition Course.

Optional: What size neural network gives optimal generalization? Convergence properties of backpropagation., Lawrence, S., Giles, C.L., Tsoi, A.C., 1998.

Optional: Efficient BackProp, Y. LeCun, L. Bottou, G. Orr and K. Muller. In Orr, G. and Muller K. (Eds), Neural Networks: Tricks of the trade, Springer, 1998.

Stochastic Gradient Tricks, Léon Bottou: Neural Networks, Tricks of the Trade, Reloaded, 430–445, Edited by Grégoire Montavon, Genevieve B. Orr and KlausRobert Müller, Lecture Notes in Computer Science (LNCS 7700), Springer, 2012.

Reducing the dimensionality of data with neural networks. G. Hinton and R. Salakhutdinov. Science 2006.

Optional: Learning MidLevel Features for Recognition.Y. Boureau, F. Bach, Y. LeCun and J. Ponce. CVPR, 2010.

ImageNet Classification with Deep Convolutional Neural Networks.A. Krizhevsky, I. Sutskever, and G. Hinton. NIPS, 2012.

Optional: Convolutional Neural Networks, from Stanford’s CS231n Convolutional Neural Networks for Visual Recognition Course.

Optional: A theoretical analysis of feature pooling in visual recognition., Boureau, YLan, Jean Ponce, and Yann LeCun. Proceedings of the 27th International Conference on Machine Learning (ICML10). 2010.

Optional: Measuring invariances in deep networks.I.J. Goodfellow, Q.V. Le, A.M. Saxe, H. Lee and A.Y. Ng. NIPS 2009.

Optional: Online learning for offroad robots: Using spatial label propagation to learn longrange traversability, Hadsell, R., Sermanet, P., Ben, J., Erkan, A., Han, J., Flepp, B., Muller, U., LeCun, Y., 2007. in: Proc. of Robotics: Science and Systems (RSS). p. 32.

Deep inside convolutional networks: Visualising image classification models and saliency maps, Simonyan, K., Vedaldi, A., Zisserman, A., 2013. arXiv preprint arXiv:1312.6034.

Visualizing and Understanding Convolutional Neural Networks. Zeiler, M.D., Fergus, R., 2013. arXiv preprint arXiv:1311.2901.

Striving for Simplicity: The All Convolutional Net, Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M., 2014, arXiv:1412.6806 [cs].

Optional: Adaptive deconvolutional networks for mid and high level feature learning, Zeiler, M.D., Taylor, G.W., Fergus, R., 2011. in: Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, pp. 2018–2025.

Optional: Adaptive deconvolutional networks for mid and high level feature learning

Deep Learning via HessianFree Optimization., Martens, James. In Proceedings of the 27th International Conference on Machine Learning (ICML10), 735–42, 2010.

Generating Text with Recurrent Neural Networks.Sutskever, Ilya, James Martens, and Geoffrey E. Hinton. In Proceedings of the 28th International Conference on Machine Learning (ICML11), 1017–24, 2011.

Optional: “On the difficulty of training recurrent neural networks,” R. Pascanu, T. Mikolov, and Y. Bengio, arXiv preprint arXiv:1211.5063, 2012.

“CNN Features offtheshelf: an Astounding Baseline for Recognition,”, A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, arXiv:1403.6382 [cs], Mar. 2014.

Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks, Graves, Alex, and Jürgen Schmidhuber. In Advances in Neural Information Processing Systems, 545–52, 2009.

“Deep visualsemantic alignments for generating image descriptions,”A. Karpathy and L. FeiFei, arXiv preprint arXiv:1412.2306, 2014.

Distributed representations of words and phrases and their compositionality, Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J., 2013.in: Advances in Neural Information Processing Systems. pp. 3111–3119.

SkipThought Vectors, Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S., 2015. in: Advances in Neural Information Processing Systems 28. Curran Associates, Inc., pp. 3294–3302.

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y., 2015. arXiv:1502.03044 [cs].

Optional:A neural probabilistic language model. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C., 2003. The Journal of Machine Learning Research 3, 1137–1155.

Optional: Efficient Estimation of Word Representations in Vector Space. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. In Proceedings of Workshop at ICLR, 2013.

Marginalized denoising autoencoders for domain adaptation. Chen, M., Xu, Z., Weinberger, K., Sha, F., 2012. arXiv preprint arXiv:1206.4683.

Tutorial on RBMs

Learning deep architectures for AI. Foundations and Trends in Machine Learning, Bengio, Y. , pp. 1–127, 2009.

An introduction to restricted Boltzmann machines, Fischer, A., Igel, C., 2012. in: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer, pp. 14–36.

Bengio IFT 6266 Lecture Notes

Greedy LayerWise Training of Deep Networks.Y. Bengio, P. Lamblin, P. Popovici, and H. Larochelle. NIPS 2006.

Multimodal learning with deep boltzmann machines, Srivastava, N., Salakhutdinov, R.R., in: Advances in Neural Information Processing Systems. pp. 2222–2230, 2012.

Graphical Models in a Nutshell, D. Koller, N. Friedman, L. Getoor, and B. Taskar (2007), in L. Getoor and B. Taskar, editors, Introduction to Statistical Relational Learning.

Building highlevel features using large scale unsupervised learning. Le, Q., Ranzato, M., Monga, R., Devin, M., Corrado, G., Chen, K., Dean, J., Ng, A., 2012. Presented at the 29th International Conference on Machine Learning (ICML 2012), pp. 81–88.

Neural Turing Machines, Graves, A., Wayne, G., Danihelka, I., 2014. arXiv:1410.5401 [cs].

Playing Atari with Deep Reinforcement Learning. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M., 2013. arXiv:1312.5602 [cs].

Mastering the game of Go with deep neural networks and tree search, Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D., 2016. Nature 529, 484–489. doi:10.1038/nature16961

Humanlevel control through deep reinforcement learning, Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D., 2015. Nature 518, 529–533. doi:10.1038/nature14236
Collaboration/Honor Policy
Students are expected to adhere to the Honor Code in this class. All work is to be accomplished independently unless expressly stated in writing otherwise (e.g., as in a team project). Do not plagiarize from any sources including the internet. Collaboration on other homework and/or take home exams is not permitted.
Grading
There are three major components to this course. The first is participation and readings: You will be expected to read all of the assigned material for class, and this will be assessed by EITHER a quiz or a short writeup analyzing the material. There will only be one of these per week, and we will alternate between quizzes and writeups. Quizzes will be short (around 15 minutes) to make sure you have done the readings. The second component is assignments, of which there will be about 45 throughout the semester. This will typically involve some combination of written problems, coding, and a small writeup. There will be no exams, since you will be evaluated throughout based on these first two components.
The major component of the course will be your semesterlong project that will take a problem of your choosing, design an approach or set of experiments, implement it, and analyze the results. Projects can involve teams. You will be able to propose your own ideas around the fifth week of the course, and I will approve or refine them with you. I will also provide a list of potential topics if you have trouble coming up with one. The project proposals will be presented to the class, there will be a midterm review of progress to make sure it’s moving along, and a final presentation will be made describing the approach and results.

Participation (5%)

Quizzes and Responses to Reading Material (25%)

Assignments (30%) (about 45)

Final Project (40%)

Proposals (5%), MidTerm Progress (5%), Final Project (30%)
Office Hours
Office hours are TBD but will likely happen in the hours before the actual class. They are also possible by appointment; feel free to email me at any time you feel it necessary to meet!
Additional Resources