Deep Learning for Perception
Georgia Tech, Spring 2015

 

Note: For Spring 2016 semester see here. Scroll below to see 2015 version of the course.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Course: CS 8803DL
Instructor: Dr. Zsolt Kira
Location: Klaus 2456
Day/Time: MWF 2-3pm
TA: Daehyung Park

Updates

April13, 2015: Project description and grading rubric up.
April10, 2015: Instructions for installing Theano without root (useful for cluster). Thanks Payam!
March13, 2015: Project information up for mid-term project presentations/writeups (see What is Required => Midterm Progress section)
March10, 2015: Questions for the write-up for this week’s readings (Week 10) are up: Reading Questions. These are due by 2pm on Friday (Mar 13, 2015), to be submitted via t-square.
March 2, 2015: CUDA installation instructions (for use with Caffe), courtesy of Chris Chow
Feb27, 2015: List of most commonly used deep learning software packages
Jan30,2015: Project proposal presentations/writeups moved to February 9th
Jan22, 2015: Instructions for installing Caffe, and a simple modified example python script (tested on latest Ubuntu 14)
Jan19, 2015: Project information up.

Course Description and Goals

This course will cover deep learning and its applications to perception in many modalities, focusing on those relevant for robotics (images, videos, and audio). Deep learning is a sub-field of machine learning that deals with learning hierarchical features representations in a data-driven manner, representing the input data in increasing levels of abstraction. This removes the need to hand-design features when applying machine learning to different modalities or problems. It has recently gained significant traction and media coverage due to its state-of-the-art performance in tasks such as object detection in computer vision (see ILSVRC2013 and 2014 as an example), terrain estimation for navigation in robotics, natural language processing, and others.

The course will cover the fundamental theory behind these techniques, with topics ranging from sparse coding/filtering, autoencoders, convolutional neural networks, and deep belief nets. We will cover both supervised and unsupervised variants of these algorithms, and motivate them by showing real-world examples in perception-related tasks, including computer vision (object recognition/classification, activity recognition, etc.), perception for robotics (obstacle avoidance, grasping), and more. We will discuss some of the previous state-of-the-art methods and how they relate to the deep learning algorithms that have recently replaced them. The principles will also be related to neuroscience and other fields to facilitate a discussion about what these new advancement mean for understanding intelligence more generally as well limitations and open problems. The course will involve a project where students will be able to take relevant research problems in their particular field, apply the techniques and principles learned in the course to develop an approach, and implement it to investigate how these techniques are applicable. Results of the project will be presented at the end of the semester to fellow students to convey what was learned, what results were achieved, and future open research areas

Learning Outcomes

This course has several learning outcomes. By the end of the course, students will be able to:

Prerequisites

This is a graduate class. The course will cover advanced machine learning topics, as related to perception, so a prior introductory machine learning, artificial intelligence, computer vision, or pattern recognition course is recommended. Strong math skills especially linear algebra, will be essential to understanding many of these techniques. Since deep learning is new (or rather, has become mainstream recently), there is no book available yet; so, the course readings will be tutorials and conference/journal papers, many of which are from machine learning conferences such as NIPS/ICML. Being able to read, analyze, and hopefully critique such research papers is therefore crucial!

Course Material

There is no textbook currently published, but we may refer to the in-progress book by Bengio et al.. The rest of the course material will be publicly available publications, or when necessary distributed material for tutorials that are not publicly accessible.

Write-Ups for Reading Material

In some weeks, there will be a quiz on the readings on Friday. Other weeks will require a write-up of the reading materials. There will be a link to a set of reading questions besides that week’s readings.

Software

Many of the tutorials and assignments will require MATLAB so you must have access to it for this course (in-class demonstrations will involve MATLAB as well). Once we move beyond the basics, some more advanced usage/projects will require Caffe which is implemented in C++ with optional python wrappers (Linux systems only).
Here are instructions for installing Caffe, and a simple modified example python script (tested on latest Ubuntu 14)

Schedule

Below is the tentative schedule of topics that will be covered. Note that there are optional additional readings if you would like to explore a particular topic in more depth.
Mind Map for this course
Week Topic Readings Slides Assignments
1
(01/05)
Introduction and Background:
Machine Learning, Features, and Non-linear mappings
[1] (Ch. 1-2)
[2] [3]
[1 Admin]
Previous state of the art: Features, Coding, Pooling, and Classification
Beginnings of Feature Learning: Sparse Coding and Filtering, LLC
2
(01/12)
Neural Networks 1.0 & 2.0
[1] (Ch 5, Sections 6.1,6.2)
[4]
[5 MLBackground]
Supervised Training of Networks, Theory and Practice
(Non-Linear Optimization, Stochastic/Mini-Batch Gradient Descent)
3
(01/19)
Sparse Autoencoders (SAE), Convolutional Neural Networks (CNNs)
[1] (Ch 6)
[5], [6], [7]
Reading Questions
[8 Autoencoders]
4
(01/26)
CNNs continued and early applications
[8, 9, 10]
NOTE: No quiz/writeup but prepare for in-class discussion of [10] on Friday
Assignment 1 Released
5
(02/02)
Recent 2014 Successes: Descriptor Matching, Stereo-based Obstacle Avoidance for Robotics
Pooling and invariance
Note: On Monday we will take a small part of the class for team introductions
On Wed. we will dedicate the class for teams to discuss project direction
[11]
Optional: [12,13]
6
(02/09)
Project Proposal Presentations
Visualization/Deconvolutional Networks
[14]
Optional: [15,16]
[14 Visualization]
Assignment Due Feb 8th!
7
(02/16)
Recurrent Neural Networks (RNNs) and their optimizaiton
Applications to NLP
[1] (Sections 12.1-12.6)
[ 15 RNNs ]
Updated!
8
(02/23)
RNNs continued,
Hessian-Free Optimization
[17] [18]
Optional: [19]
[ 16 HF Optimization ]
9
(03/02)
Deep learning for language: Word/sentence vectors, parsing, sentiment analysis, etc.
[20] [21]
Optional: [22]
Assignment 2 Due March 1st, 2015 at 11:55PM
10
(03/09)
Guest Lecture on Language and domain adaptation (Yi Yang from Jacob Einstein’s group)
Background: Probabilistic Graphical Models
Hopfield Nets, Boltzmann machines, Restricted Boltzmann Machines
[23] [ 20 Yi Yang Domain Adaptation ]
11
(03/16)
SPRING BREAK
12
(03/23)
Mid-Term Project Presentations
13
(03/30)
Hopfield Networks, (Restricted) Bolzmann Machines
[1 Ch.3 for background on probability theory]
[24][25 Ch. 6]
Optional: [26][27]
[ 21 Restricted Boltzmann Machines ]
14
04/06
Deep Belief Nets, Stacked RBMs
Applications to NLP , Pose and Activity Recognition in Videos
Recent Advances
[28] [29]
Optional Background: [30]
[ 24 Speech ]
15
04/13
Large-Scale Learning
Neural Turing Machines
[31] [32]
16
04/20
Monday: Poster presentations
Wed.: Highly-rated teams present
Friday: Wrap-up

References

  1. “Deep Learning”, Yoshua Bengio and Ian J. Goodfellow and Aaron Courville, Book in preparation for MIT Press.
  2. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. B.A. Olshausen, and D.J. Field. Nature, 1996.
  3. Efficient sparse coding algorithms.H. Lee, A. Battle, R. Raina, and A. Y. Ng. NIPS, 2007.
  4. Linear Classification, from Stanford’s CS231n Convolutional Neural Networks for Visual Recognition Course.
  5. What size neural network gives optimal generalization? Convergence properties of backpropagation., Lawrence, S., Giles, C.L., Tsoi, A.C., 1998.
  6. Efficient BackProp, Y. LeCun, L. Bottou, G. Orr and K. Muller. In Orr, G. and Muller K. (Eds), Neural Networks: Tricks of the trade, Springer, 1998.
  7. Stochastic Gradient Tricks, Léon Bottou: Neural Networks, Tricks of the Trade, Reloaded, 430–445, Edited by Grégoire Montavon, Genevieve B. Orr and Klaus-Robert Müller, Lecture Notes in Computer Science (LNCS 7700), Springer, 2012.
  8. Reducing the dimensionality of data with neural networks. G. Hinton and R. Salakhutdinov. Science 2006.
  9. Learning Mid-Level Features for Recognition.Y. Boureau, F. Bach, Y. LeCun and J. Ponce. CVPR, 2010.
  10. ImageNet Classification with Deep Convolutional Neural Networks.A. Krizhevsky, I. Sutskever, and G. Hinton. NIPS, 2012.
  11. A theoretical analysis of feature pooling in visual recognition., Boureau, Y-Lan, Jean Ponce, and Yann LeCun. Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010.
  12. Optional: Measuring invariances in deep networks.I.J. Goodfellow, Q.V. Le, A.M. Saxe, H. Lee and A.Y. Ng. NIPS 2009.
  13. Optional: Online learning for offroad robots: Using spatial label propagation to learn long-range traversability, Hadsell, R., Sermanet, P., Ben, J., Erkan, A., Han, J., Flepp, B., Muller, U., LeCun, Y., 2007. in: Proc. of Robotics: Science and Systems (RSS). p. 32.
  14. Deep inside convolutional networks: Visualising image classification models and saliency maps, Simonyan, K., Vedaldi, A., Zisserman, A., 2013. arXiv preprint arXiv:1312.6034.
  15. Optional: Visualizing and Understanding Convolutional Neural Networks. Zeiler, M.D., Fergus, R., 2013. arXiv preprint arXiv:1311.2901.
  16. Optional: Adaptive deconvolutional networks for mid and high level feature learning, Zeiler, M.D., Taylor, G.W., Fergus, R., 2011. in: Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, pp. 2018–2025.
  17. Deep Learning via Hessian-Free Optimization., Martens, James. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), 735–42, 2010.
  18. Generating Text with Recurrent Neural Networks.Sutskever, Ilya, James Martens, and Geoffrey E. Hinton. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 1017–24, 2011.
  19. Optional: Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks, Graves, Alex, and Jürgen Schmidhuber. In Advances in Neural Information Processing Systems, 545–52, 2009.
  20. A neural probabilistic language model. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C., 2003. The Journal of Machine Learning Research 3, 1137–1155.
  21. Distributed representations of words and phrases and their compositionality, Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J., 2013.in: Advances in Neural Information Processing Systems. pp. 3111–3119.
  22. Optional: Efficient Estimation of Word Representations in Vector Space. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. In Proceedings of Workshop at ICLR, 2013.
  23. Marginalized denoising autoencoders for domain adaptation. Chen, M., Xu, Z., Weinberger, K., Sha, F., 2012. arXiv preprint arXiv:1206.4683.
  24. Tutorial on RBMs
  25. Learning deep architectures for AI. Foundations and Trends in Machine Learning, Bengio, Y. , pp. 1–127, 2009.
  26. An introduction to restricted Boltzmann machines, Fischer, A., Igel, C., 2012. in: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer, pp. 14–36.
  27. Bengio IFT 6266 Lecture Notes
  28. Greedy Layer-Wise Training of Deep Networks.Y. Bengio, P. Lamblin, P. Popovici, and H. Larochelle. NIPS 2006.
  29. Multimodal learning with deep boltzmann machines, Srivastava, N., Salakhutdinov, R.R., in: Advances in Neural Information Processing Systems. pp. 2222–2230, 2012.
  30. Graphical Models in a Nutshell, D. Koller, N. Friedman, L. Getoor, and B. Taskar (2007), in L. Getoor and B. Taskar, editors, Introduction to Statistical Relational Learning.
  31. Building high-level features using large scale unsupervised learning. Le, Q., Ranzato, M., Monga, R., Devin, M., Corrado, G., Chen, K., Dean, J., Ng, A., 2012. Presented at the 29th International Conference on Machine Learning (ICML 2012), pp. 81–88.
  32. Neural Turing Machines, Graves, A., Wayne, G., Danihelka, I., 2014. arXiv:1410.5401 [cs].

Collaboration/Honor Policy

Students are expected to adhere to the Honor Code in this class. All work is to be accomplished independently unless expressly stated in writing otherwise (e.g., as in a team project). Do not plagiarize from any sources including the internet. Collaboration on other homework and/or take home exams is not permitted.

Grading

There are three major components to this course. The first is participation and readings: You will be expected to read all of the assigned material for class, and this will be assessed by EITHER a quiz or a short write-up analyzing the material. There will only be one of these per week, and we will alternate between quizzes and writeups. Quizzes will be short (around 15 minutes) to make sure you have done the readings. The second component is assignments, of which there will be about 4-5 throughout the semester. This will typically involve some combination of written problems, coding, and a small writeup. There will be no exams, since you will be evaluated throughout based on these first two components.
The major component of the course will be your semester-long project that will take a problem of your choosing, design an approach or set of experiments, implement it, and analyze the results. Projects can involve teams. You will be able to propose your own ideas around the fifth week of the course, and I will approve or refine them with you. I will also provide a list of potential topics if you have trouble coming up with one. The project proposals will be presented to the class, there will be a mid-term review of progress to make sure it’s moving along, and a final presentation will be made describing the approach and results.

Office Hours

Office hours are by appointment; feel free to email me at any time you feel it necessary to meet!

Additional Resources