Deep Learning for Perception
Georgia Tech, Spring 2015

Note: For Spring 2016 semester see here. Scroll below to see 2015 version of the course.

Course: CS 8803DL

Instructor: Dr. Zsolt Kira

Location: Klaus 2456

Day/Time: MWF 2-3pm

TA: Daehyung Park

Updates

April 13, 2015: Project description and grading rubric up.

April 10, 2015: Instructions for installing Theano without root (useful for cluster). Thanks Payam!

March 13, 2015: Project information up for mid-term project presentations/writeups (see What is Required => Midterm Progress section)

March 10, 2015: Questions for the write-up for this week’s readings (Week 10) are up: Reading Questions. These are due by 2pm on Friday (Mar 13, 2015), to be submitted via t-square.

March 2, 2015: CUDA installation instructions (for use with Caffe), courtesy of Chris Chow

Feb 27, 2015: List of most commonly used deep learning software packages

Jan 30,2015: Project proposal presentations/writeups moved to February 9th

Jan 22, 2015: Instructions for installing Caffe, and a simple modified example python script (tested on latest Ubuntu 14)

Jan 19, 2015: Project information up.

Course Description and Goals

This course will cover deep learning and its applications to perception in many modalities, focusing on those relevant for robotics (images, videos, and audio). Deep learning is a sub-field of machine learning that deals with learning hierarchical features representations in a data-driven manner, representing the input data in increasing levels of abstraction. This removes the need to hand-design features when applying machine learning to different modalities or problems. It has recently gained significant traction and media coverage due to its state-of-the-art performance in tasks such as object detection in computer vision (see ILSVRC2013 and 2014 as an example), terrain estimation for navigation in robotics, natural language processing, and others.

The course will cover the fundamental theory behind these techniques, with topics ranging from sparse coding/filtering, autoencoders, convolutional neural networks, and deep belief nets. We will cover both supervised and unsupervised variants of these algorithms, and motivate them by showing real-world examples in perception-related tasks, including computer vision (object recognition/classification, activity recognition, etc.), perception for robotics (obstacle avoidance, grasping), and more. We will discuss some of the previous state-of-the-art methods and how they relate to the deep learning algorithms that have recently replaced them. The principles will also be related to neuroscience and other fields to facilitate a discussion about what these new advancement mean for understanding intelligence more generally as well limitations and open problems. The course will involve a project where students will be able to take relevant research problems in their particular field, apply the techniques and principles learned in the course to develop an approach, and implement it to investigate how these techniques are applicable. Results of the project will be presented at the end of the semester to fellow students to convey what was learned, what results were achieved, and future open research areas

Learning Outcomes

This course has several learning outcomes. By the end of the course, students will be able to:

Describe the fundamental advancements made in deep learning in the past 5 years and explain why they have led to a small revolution in the field of machine learning and perception in multiple modalities
Describe how these techniques relate to previous methods that were state of the art, for example typical computer vision pipelines
Categorize, compare, and contrast various deep learning algorithms and explain which are better suited for particular types of real-world data or problems than others
Read and understand research papers in the area and be able to identify the key claims and ideas within them
Discuss how these techniques relate to our understanding of intelligence and other fields such as neuroscience that explore these topics as they occur in nature
Design and carry out a project within their area of interest, apply the learned techniques to new types of data within this area, and analyze the performance of the algorithms within it

Prerequisites

This is a graduate class. The course will cover advanced machine learning topics, as related to perception, so a prior introductory machine learning, artificial intelligence, computer vision, or pattern recognition course is recommended. Strong math skills especially linear algebra, will be essential to understanding many of these techniques. Since deep learning is new (or rather, has become mainstream recently), there is no book available yet; so, the course readings will be tutorials and conference/journal papers, many of which are from machine learning conferences such as NIPS/ICML. Being able to read, analyze, and hopefully critique such research papers is therefore crucial!

Course Material

There is no textbook currently published, but we may refer to the in-progress book by Bengio et al.. The rest of the course material will be publicly available publications, or when necessary distributed material for tutorials that are not publicly accessible.

Write-Ups for Reading Material

In some weeks, there will be a quiz on the readings on Friday. Other weeks will require a write-up of the reading materials. There will be a link to a set of reading questions besides that week’s readings.

Software

Many of the tutorials and assignments will require MATLAB so you must have access to it for this course (in-class demonstrations will involve MATLAB as well). Once we move beyond the basics, some more advanced usage/projects will require Caffe which is implemented in C++ with optional python wrappers (Linux systems only).

Here are instructions for installing Caffe, and a simple modified example python script (tested on latest Ubuntu 14)

Schedule

Below is the tentative schedule of topics that will be covered. Note that there are optional additional readings if you would like to explore a particular topic in more depth.

Mind Map for this course

Week	Topic	Readings	Slides	Assignments
1 (01/05)	Introduction and Background: Machine Learning, Features, and Non-linear mappings	[1] (Ch. 1-2) [2] [3]	[1 Admin] [2 DL Overview]
	Previous state of the art: Features, Coding, Pooling, and Classification Beginnings of Feature Learning: Sparse Coding and Filtering, LLC		[3 Unsupervised Features] [4 Deep Sparse Coding]
2 (01/12)	Neural Networks 1.0 & 2.0	[1] (Ch 5, Sections 6.1,6.2) [4]	[5 MLBackground]
	Supervised Training of Networks, Theory and Practice (Non-Linear Optimization, Stochastic/Mini-Batch Gradient Descent)		[6 Back Propagation] [Notes] [CleanNotes] [7 Training NNs]
3 (01/19)	Sparse Autoencoders (SAE), Convolutional Neural Networks (CNNs)	[1] (Ch 6) [5], [6], [7] Reading Questions	[8 Autoencoders]
4 (01/26)	CNNs continued and early applications	[8, 9, 10] NOTE: No quiz/writeup but prepare for in-class discussion of [10] on Friday	[9 CNNs] [Notes] [10 TrainingTips] [11 CNN Applications]	Assignment 1 Released
5 (02/02)	Recent 2014 Successes: Descriptor Matching, Stereo-based Obstacle Avoidance for Robotics Pooling and invariance Note: On Monday we will take a small part of the class for team introductions On Wed. we will dedicate the class for teams to discuss project direction	[11] Optional: [12,13]	[12 CNN Applications] [13 Pooling]
6 (02/09)	Project Proposal Presentations Visualization/Deconvolutional Networks	[14] Optional: [15,16]	[14 Visualization]	Assignment Due Feb 8th!
7 (02/16)	Recurrent Neural Networks (RNNs) and their optimizaiton Applications to NLP	[1] (Sections 12.1-12.6)	[ 15 RNNs ] Updated!
8 (02/23)	RNNs continued, Hessian-Free Optimization	[17] [18] Optional: [19]	[ 16 HF Optimization ]
9 (03/02)	Deep learning for language: Word/sentence vectors, parsing, sentiment analysis, etc.	[20] [21] Optional: [22]	[ 17 Word Embeddings ] [ 18 Sentence Embeddings ] [ 19 RNN Applications ]	Assignment 2 Due March 1st, 2015 at 11:55PM
10 (03/09)	Guest Lecture on Language and domain adaptation (Yi Yang from Jacob Einstein’s group) Background: Probabilistic Graphical Models Hopfield Nets, Boltzmann machines, Restricted Boltzmann Machines	[23]	[ 20 Yi Yang Domain Adaptation ]
11 (03/16)	SPRING BREAK
12 (03/23)	Mid-Term Project Presentations
13 (03/30)	Hopfield Networks, (Restricted) Bolzmann Machines	[1 Ch.3 for background on probability theory] [24][25 Ch. 6] Optional: [26][27]	[ 21 Restricted Boltzmann Machines ]
14 04/06	Deep Belief Nets, Stacked RBMs Applications to NLP , Pose and Activity Recognition in Videos Recent Advances	[28] [29] Optional Background: [30]	[ 22 Restricted Boltzmann Machines ] [ 23 Applications ] [ 24 Speech ]
15 04/13	Large-Scale Learning Neural Turing Machines	[31] [32]	[ 25 Distributed Training ] [ 26 Neural Turing Machines ]
16 04/20	Monday: Poster presentations Wed.: Highly-rated teams present Friday: Wrap-up

References

“Deep Learning”, Yoshua Bengio and Ian J. Goodfellow and Aaron Courville, Book in preparation for MIT Press.
Emergence of simple-cell receptive field properties by learning a sparse code for natural images. B.A. Olshausen, and D.J. Field. Nature, 1996.
Efficient sparse coding algorithms.H. Lee, A. Battle, R. Raina, and A. Y. Ng. NIPS, 2007.
Linear Classification, from Stanford’s CS231n Convolutional Neural Networks for Visual Recognition Course.
What size neural network gives optimal generalization? Convergence properties of backpropagation., Lawrence, S., Giles, C.L., Tsoi, A.C., 1998.
Efficient BackProp, Y. LeCun, L. Bottou, G. Orr and K. Muller. In Orr, G. and Muller K. (Eds), Neural Networks: Tricks of the trade, Springer, 1998.
Stochastic Gradient Tricks, Léon Bottou: Neural Networks, Tricks of the Trade, Reloaded, 430–445, Edited by Grégoire Montavon, Genevieve B. Orr and Klaus-Robert Müller, Lecture Notes in Computer Science (LNCS 7700), Springer, 2012.
Reducing the dimensionality of data with neural networks. G. Hinton and R. Salakhutdinov. Science 2006.
Learning Mid-Level Features for Recognition.Y. Boureau, F. Bach, Y. LeCun and J. Ponce. CVPR, 2010.
ImageNet Classification with Deep Convolutional Neural Networks.A. Krizhevsky, I. Sutskever, and G. Hinton. NIPS, 2012.
A theoretical analysis of feature pooling in visual recognition., Boureau, Y-Lan, Jean Ponce, and Yann LeCun. Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010.
Optional: Measuring invariances in deep networks.I.J. Goodfellow, Q.V. Le, A.M. Saxe, H. Lee and A.Y. Ng. NIPS 2009.
Optional: Online learning for offroad robots: Using spatial label propagation to learn long-range traversability, Hadsell, R., Sermanet, P., Ben, J., Erkan, A., Han, J., Flepp, B., Muller, U., LeCun, Y., 2007. in: Proc. of Robotics: Science and Systems (RSS). p. 32.
Deep inside convolutional networks: Visualising image classification models and saliency maps, Simonyan, K., Vedaldi, A., Zisserman, A., 2013. arXiv preprint arXiv:1312.6034.
Optional: Visualizing and Understanding Convolutional Neural Networks. Zeiler, M.D., Fergus, R., 2013. arXiv preprint arXiv:1311.2901.
Optional: Adaptive deconvolutional networks for mid and high level feature learning, Zeiler, M.D., Taylor, G.W., Fergus, R., 2011. in: Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, pp. 2018–2025.
Deep Learning via Hessian-Free Optimization., Martens, James. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), 735–42, 2010.
Generating Text with Recurrent Neural Networks.Sutskever, Ilya, James Martens, and Geoffrey E. Hinton. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 1017–24, 2011.
Optional: Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks, Graves, Alex, and Jürgen Schmidhuber. In Advances in Neural Information Processing Systems, 545–52, 2009.
A neural probabilistic language model. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C., 2003. The Journal of Machine Learning Research 3, 1137–1155.
Distributed representations of words and phrases and their compositionality, Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J., 2013.in: Advances in Neural Information Processing Systems. pp. 3111–3119.
Optional: Efficient Estimation of Word Representations in Vector Space. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. In Proceedings of Workshop at ICLR, 2013.
Marginalized denoising autoencoders for domain adaptation. Chen, M., Xu, Z., Weinberger, K., Sha, F., 2012. arXiv preprint arXiv:1206.4683.
Tutorial on RBMs
Learning deep architectures for AI. Foundations and Trends in Machine Learning, Bengio, Y. , pp. 1–127, 2009.
An introduction to restricted Boltzmann machines, Fischer, A., Igel, C., 2012. in: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer, pp. 14–36.
Bengio IFT 6266 Lecture Notes
Greedy Layer-Wise Training of Deep Networks.Y. Bengio, P. Lamblin, P. Popovici, and H. Larochelle. NIPS 2006.
Multimodal learning with deep boltzmann machines, Srivastava, N., Salakhutdinov, R.R., in: Advances in Neural Information Processing Systems. pp. 2222–2230, 2012.
Graphical Models in a Nutshell, D. Koller, N. Friedman, L. Getoor, and B. Taskar (2007), in L. Getoor and B. Taskar, editors, Introduction to Statistical Relational Learning.
Building high-level features using large scale unsupervised learning. Le, Q., Ranzato, M., Monga, R., Devin, M., Corrado, G., Chen, K., Dean, J., Ng, A., 2012. Presented at the 29th International Conference on Machine Learning (ICML 2012), pp. 81–88.
Neural Turing Machines, Graves, A., Wayne, G., Danihelka, I., 2014. arXiv:1410.5401 [cs].

Collaboration/Honor Policy

Students are expected to adhere to the Honor Code in this class. All work is to be accomplished independently unless expressly stated in writing otherwise (e.g., as in a team project). Do not plagiarize from any sources including the internet. Collaboration on other homework and/or take home exams is not permitted.

Grading

There are three major components to this course. The first is participation and readings: You will be expected to read all of the assigned material for class, and this will be assessed by EITHER a quiz or a short write-up analyzing the material. There will only be one of these per week, and we will alternate between quizzes and writeups. Quizzes will be short (around 15 minutes) to make sure you have done the readings. The second component is assignments, of which there will be about 4-5 throughout the semester. This will typically involve some combination of written problems, coding, and a small writeup. There will be no exams, since you will be evaluated throughout based on these first two components.

The major component of the course will be your semester-long project that will take a problem of your choosing, design an approach or set of experiments, implement it, and analyze the results. Projects can involve teams. You will be able to propose your own ideas around the fifth week of the course, and I will approve or refine them with you. I will also provide a list of potential topics if you have trouble coming up with one. The project proposals will be presented to the class, there will be a mid-term review of progress to make sure it’s moving along, and a final presentation will be made describing the approach and results.

Participation (5%)
Quizzes and Responses to Reading Material (25%)
Assignments (30%) (about 4-5)
Final Project (40%)
- Proposals (5%), Mid-Term Progress (5%), Final Project (30%)

Office Hours

Office hours are by appointment; feel free to email me at any time you feel it necessary to meet!

Additional Resources

Upcoming Stanford Convolutional Neural Networks course
CVPR 2014: Slides RSS 2014 Talk (Andrew Ng): Video
NAACL 2013: Deep Learning for Natural Language Processing
ICML 2013 Tutorial (Lecun): Video Slides
ICML 2013 Challenges of Representation Learning: Page
Spring 2013 on Large-Scale ML Q&A (Lecun): Video
IPAM 2012 Summer School on DL & RL: Videos and Slides
CVPR 2012 Tutorial(Fergus)
deeplearning.net (tutorials, etc.)
Caffe CNN Library

Deep Learning for Perception Georgia Tech, Spring 2015