PhD CS – Intelligent Systems Body of Knowledge

Perception Reading List


  • Szeliski (2010) Computer Vision: Algorithms and Applications
  • Hartley & Zissermann (2004) Multiple View Geometry, 2nd ed. Chapters: 2-4, 6-8, 9-13. 

Surveys of Some Topics

  • Mundy, J. L. (2006). "Object Recognition in the Geometric Era: a Retrospective." Lecture Notes in Computer Science 4170/2006.
  • Maintz & Viergever (1998) "A survey of medical image registration." Medical image analysis, 1998 - Elsevier
  • M. Fritz, M. Andriluka, S. Fidler, M. Stark, A. Leonardis and B. Schiele (2010) "Categorical Perception" Chapter in Cognitive Systems, Springer Verlag, 2010 (

General Algorithms

  • Fischler, M. A. and R. C. Bolles (1981). "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography." Communications of the ACM.
  • Felzenszwalb, P. and D. P. Huttenlocher (2004). "Efficient Belief Propagation for Early Vision." CVPR.
  • Boykov, Y., Veksler O., et al. (2001). "Fast Approximate Energy Minimization via Graph Cuts." PAMI. 

Light field

  • Adelson, E. H. and J. R. Bergen (1991). "The Plenoptic Function and the Elements of Early Vision." Computational Models of Visual Processing.

Object Recognition

  • Torralba, A. (2001). "Contextual Priming for Object Detection." IJCV.
  • Winn, J., A. Criminisi, et al. (2005). "Object Categorization by Learned Universal Visual Dictionary." ICCV.
  • Hoiem, D., A. A. Efros, et al. (2007). "Recovering Surface Layout from an Image." IJCV.
  • Fergus, R., P. Perona, et al. (2007). "Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition." IJCV.
  • Viola, P. and M. J. Jones (2004). "Robust Real-Time Face Detection." IJCV.
  • Murase H. & Nayar, S. K (1995). "Visual learning and recognition of 3-d objects from appearance" IJCV 


  • Kass, M., A. Witkin, et al. (1988). "Snakes: Active contour models." IJCV.
  • Xu, C. and J. L. Prince (1998). "Snakes, Shapes, and Gradient Vector Flow." IEEE Transactions on Image Processing.
  • Belongie, S., J. Malik, et al. (2002). "Shape Matching and Object Recognition Using Shape Contexts." PAMI.

Edges / Filters

  • Canny, J. (1986). "A Computational Approach to Edge Detection." PAMI.
  • Perona, P. and J. Malik (1990). "Scale-space and edge detection using anisotropic diffusion." PAMI.
  • Barash, D. (2002). "A Fundamental Relationship between Bilateral Filtering, Adaptive Smoothing and the Nonlinear Diffusion Equation." PAMI. 

Features / Descriptors / Matching

  • Lowe, D. G. (2004). "Distinctive Image Features from Scale-Invariant Keypoints." IJCV.
  • Mikolajczyk, K., T. Tuytelaars, et al. (2005). "A Comparison of Affine Region Detectors." IJCV.
  • Nister, D. and H. Stewenius (2006). "Scalable Recognition with a Vocabulary Tree." CVPR. 

Tracking and Active Models

  • Isard, M. and A. Blake (1998). "Condensation - conditional density propagation for visual tracking." IJCV.
  • Comaniciu, D., V. R. Ramesh, et al. (2000). "Real-time tracking of non-rigid objects using mean-shift." CVPR.
  • Cootes, T. F., G. J. Edwards, et al. (1998). "Active appearance models." ECCV. 

Action Recognition

  • Bobick, A. and J. W. Davis (2001). "The Recognition of Human Movement Using Temporal Templates." PAMI.
  • Efros, A. A., A. C. Berg, et al. (2003). "Recognizing Actions at a Distance." ICCV.
  • Gorelick, L., M. Blank, et al. (2007). "Actions as Space-Time Shapes." PAMI.
  • Laptev, I., M. Marszarek, et al. (2008). "Learning realistic human actions from movies." CVPR. 

Structure from Motion

  • Triggs, B., P. F. McLauchlan, et al. (1999). "Bundle Adjustment - A Modern Synthesis." Vision Algorithms.
  • Pollefeys, M., R. Koch, et al. (1999). "Self-Calibration and Metric Reconstruction in spite of Varying and Unknown Internal Camera Parameters." IJCV.
  • Snavely, N., S. M. Seitz, et al. (2007). "Modeling the world from Internet photo collections." IJCV. 

Segmentation / Layer extraction

  • Torr, P. H. S., R. Szeliski, et al. (2001). "An Integrated Bayesian Approach to Layer Extraction from Image Sequences." PAMI.
  • Comaniciu, D. and P. Meer (2002). "Mean Shift: A Robust Approach toward Feature Space Analysis." PAMI.
  • Felzenszwalb, P. and D. P. Huttenlocher (2004). "Efficient Graph-Based Image Segmentation." IJCV.
  • Shi J. and Malik. J. (2000). "Normalized Cuts and Image Segmentation" PAMI 

Stereo / Optical Flow

  • Bergen, J. R., P. Anandan, et al. (1992). "Hierarchical model-based motion estimation." ECCV.
  • Black, M. J. and P. Anandan (1996). "The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields." Computer Vision and Image Understanding.
  • Baker, S., D. Scharstein, et al. (2009). "A Database and Evaluation Methodology for Optical Flow." MSR TechReport.
  • Seitz, S. M. and J. Kim (2001). "The Space of All Stereo Images." IJCV.
  • Zabih, V. K. a. R. (2002). "Multi-camera Scene Reconstruction via Graph Cuts." ECCV.-
  • Scharstein, D. and R. Szeliski (2002). " A taxonomy and evaluation of dense two-frame stereo correspondence algorithms." IJCV. 

Texture / Image synthesis

  • Brown, M. and D. G. Lowe (2003). "Recognising Panoramas." ICCV.
  • Burt, P. J. and E. H. Adelson (1983). "A multiresolution spline with application to image mosaics." ACM Transactions on Graphics.
  • Efros, A. A. and T. K. Leung (1999). "Texture Synthesis by Non-parametric Sampling." ICCV.
  • Kwatra, V., A. Schodl, et al. (2003). "Graphcut Textures: Image and Video Synthesis Using Graph Cuts." SIGGRAPH.
  • Jojic, N., B. J. Frey, et al. (2003). "Epitomic analysis of appearance and shape." ICCV.
  • Hays, J. and A. Efros (2007). "Scene Completion Using Millions of Photographs." SIGGRAPH. 

Color recognition / HDR

  • Debevec, P. E. and J. Malik (1997). "Recovering High Dynamic Range Radiance Maps from Photographs." SIGGRAPH.
  • Swain, M. and D. Ballard (1990). "Indexing via Color Histograms." ICCV. 


  • Schodl, A., R. Szeliski, et al. (2000). "Video Textures." SIGGRAPH.
  • Wexler, Y., E. Shechtman, et al. (2007). "Space-Time Video Completion." PAMI. 

Vision for Robotics

  • Andrew J. Davison, Ian D. Reid, Nicholas D. Molton and Olivier Stasse (2007) "MonoSLAM: Real-Time Single Camera SLAM" IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 29(6), pp 1052--1067, 2007