In this assignment you will design a dynamic programming algorithm to match two images, one scan-line at a time. You will exploit the ordering constraint that we discussed in class. This is the second part of a three-part stereo assignment. Data structures and pseudocode from this assignment will be the basis for the final stereo matching program in the next problem set. You may find the supplemental readings on stereo listed on the course web site to be helpful, but they are not required. In this problem set it may be helpful to use some of the image processing functions in the OpenCV library.
Be sure to do the following when you turn in the assignment:
Grading: This is the undergraduate level version. I expect most students in 4495 will do this one. The answers from the students in 4495 will determine the mean for this assignment track. Any students from 7495 who do this version of the assignment will be graded on a more demanding curve.
1. [3%] Given left and right scanlines from a stereo pair, we can solve the correspondence problem with ordering constraints by means of dynamic programming on an M by M lattice, where M is the length of each scanline. Each path through the lattice represents a valid set of correspondences, and DP selects the optimal path. In this question we will explore some properties of this representation and test a simple implementation.
(a) We write the stereo equations as xL = f X / Z and xR = f (X - B) / Z and define the disparity d = xL - xR. Show that d > 0. What implication does this have for the search lattice in the dynamic programming approach to matching? In the following DP lattices, which paths correspond to valid stereo pairs?
![]() |
![]() |
![]() |
|
(a) |
(b) |
(c) |
(b) For each of the above cases that represent a valid stereo pair, sketch the depth profile of a surface in the world that would produce these disparities. By depth profile we mean the depths in the scene for a single scanline, corresponding to an X-Z slice through some 3-D surface. (Numerical accuracy is not required here. A piecewise planar scene with the right qualitative structure as far as occlusions and disocclusions are concerned is all that is necessary.) Number the key points in the depth profile and their corresponding locations on the DP lattice.
(c) The DP algorithm for the lattice is quite simple, and reflects the general structure of DP methods. Three main functions are required: initBoundary to initialize the lattice values, forwardPass to traverse the lattice from upper left to lower right, updating the costs and parent links, and backChain to start in the lower right corner and follow the parent links to recover the optimal path. C++ code for these functions can be found in the files DP.h and DP.cpp. In a few places in DP.cpp, key lines of code are missing, marked with the comment "YOUR CODE HERE !!". Supply the missing lines so that the algorithm works. Try it out on the pair of example scan-lines that are provided and print out your results. Explain why the path you have found is optimal, by referring to the input scan-lines. Explain why the cost is optimal. For this part you must turn in your printout of the results, the required explanation above, and your source code changes.
Not required: You may want to experiment with this code to better understand how DP is working. Try changing the pixel values in the scanlines and the occlusion costs to see the effect on the result. It may be helpful to print out the cost and parent values for the lattice as they are filled in by forwardPass, to get more insight into what is happening.
2. [5%] The DP algorithm from problem 1(c) has several limitations. First, the matching score is based on the absolute value of the difference of two corresponding pixel values, rather than on an SSD score over a window of pixels. Second, in practice, we would prefer not to have to consider all possible matches. For example, most real scenes have a restricted range of depths. By restricting the number of disparity values which have to be considered, we can save both computation and storage. Third, the lattice representation encodes the disparity at each pixel implicitly, as the difference between column and row coordinates, but it would be more convenient if it were stored explicitly.
The Disparity Space Image (DSI) is a standard data structure for stereo matching algorithms. It encodes the same information as the lattice, but in a more compact and useful format. The DSI image is D by W, where D is the number of discrete disparities and W is the width of the scanline. The columns in the DSI correspond to pixels in the left scanline. The rows correspond to disparity values. In the lattice, a match between two pixels was encoded by a vertex with coordinates (xL, xR). In the DSI, this match is encoded by the disparity d at column xL (with xR = xL - d). Below we have reproduced figure (c) above along with the corresponding left-to-right path through a DSI image.
![]() |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
(d) Path through DP Lattice |
(e) Corresponding Path Through DSI Image for Left Scanline |
Each cell in the DSI image can have one of three labels: M stands for 'match' and identifies a match with the right scanline at the given disparity value. L is for 'left' and denotes a pixel which is visible in the left image only (the occluded case). Likewise R identifies pixels which are visible in the right image only (the disoccluded case). Given a valid path through the DSI image, each column will contain either an M or an L (since each pixel on the left scanline is either matched or occluded in the right). The disparity value associated with an 'L' label identifies the next available pixel to match in the right scanline. The disparity associated with an 'R' label identifies a disoccluded pixel at that disparity in the right scanline. Intuitively, 'M' doesn't change the disparity, 'L' increases the disparity (skipping over occluded pixels), and 'R' decreases the disparity (skipping over disoccluded pixels). The DSI is initialized by assigning the label M to cell (x = 0, d = 0). The terminal state is cell (x = W - 1, d = 0).
(a) Below is another example path through a DSI image. Draw the corresponding path through a DP lattice. (This example, as well as the description of the data structure above, is due to Scharstein and Szeliski).
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
(f) Sample Path Through DSI Image |
Dynamic programming using the DSI image proceeds in a directly analogous fashion to DP on the lattice. There are seven possible transitions from one cell to another. They are illustrated below, with the parent state in lower-case and the current state in upper case. For example, to generate the first 5 states in the path from example (f) above, you would apply the rules (i), (i), (ii), and (iii), with cell (x = 0, d = 0) initialized to state M. Note that at the boundaries of the DSI, some of the transitions will be invalid. For example, only transitions (v) and (vi) can be applied to a parent state in the last column, while (v) and (vi) cannot be applied to states in the first row (d = 0) and (iii) and (iv) cannot be applied to states in the last row (maximum d value). No further transitions can be applied once you reach the terminal cell (x = W - 1, d = 0).
|
||||||||||||||||||||||||||||||||
|
(g) The seven possible transitions between cells in a DSI image |
Since there are three possible states for each cell, we need to maintain three separate costs at each cell. The cost values correspond to the cumulative cost along the lowest cost path to the cell which ends in each of the possible states. In addition to the cumulative cost at each cell, we have to store a back-pointer to the predecessor cell. The transition from that cell to the current cell results in the updated cost. Two three-band images of D by W dimensions, DSI_cost and DSI_ptr, hold the costs and pointers respectively. Thus DSI_cost(d, x, s) holds the cost of being in state s at location (d, x) in the DSI image, while DSI_ptr(d, x, s) gives the parent cell for the path ending in (d, x, s). Let s = 0, 1, 2 denote the states M, L, R, respectively.
Each transition to a parent cell can be described by an offset vector (od, ox, os) which functions as a backpointer. If transition i ends in cell (d, x, s) then the parent cell is (d - odi, x - oxi, s - osi). Let bpi(d, x, s) denote the function which maps the current cell into its parent under transition i. There are seven possible values for i, corresponding to the seven possible transitions listed in figure (g). DSI_ptr will contain i, and a look-up table can be used to map this index to the offset vector (odi, oxi, osi). This is how you will implement bpi(d, x, s) and perform back-chaining in your final program.
(b) Write the offset vectors (odi, oxi, osi) for the seven possible DSI transitions listed in figure (g). For example, the offset vector for transition (iv) would be (1,1,-1), since s=1 for L and s=0 for M.
(c) Write pseudocode for generating DSI_cost and DSI_ptr while traversing the DSI image from left to right. This function is analogous to the forwardPass function in problem 1(c). The cost of a match (i.e. s = 0) can be obtained directly from the SSD images from problem set 2. The cost of assigning the left or right state to a cell is the occlusion_cost, a constant which you will have to set by hand. These costs can be preloaded into DSI_cost to speed the computation. Specify the initial conditions at x = 0. They should be designed so that the cell (0,0,0) is always the first cell in the path. This pseudocode will be the basis for your implementation of a function DSI_forward in problem set 4. Please print out your pseudocode for submission.
(d) Design a set of match costs and an occlusion_cost that will result in the path from figure (f) above. These costs should have the property that the path in figure (f) is optimal.
Instructions for submitting: Turn in hardcopy containing your answers to the questions above. Hand-written solutions are fine as long as they are legible. Make sure you include the the missing lines in the program from part 1(c) with your solution. We don't need your executable or the complete source for this assignment.