A CS7495 Final Project by Sooraj Bhat

**Quick Links:**

- presentation slide (PPT) (PNG) (HTML)
- everything but the movies: source code, samples & results (ZIP)
**Matlab 7 users:**Matlab 7 does some stronger typechecking, so you will need`inpaint7.m`in place of`inpaint.m`that is bundled with the zip archive.- Fill region animations (AVI). Synthetic image (black and white halfspaces): compressed, 52KB, uncompressed, ~14MB; Natural image (bungee jumper): compressed, ~1MB, uncompressed, ~60MB.

A new algorithm is proposed for removing large objects from digital images. The challenge is to fill in the hole that is left behind in a visually plausible way.

In the past, this problem has been addressed by two classes of
algorithms: (i) "texture synthesis" algorithms for generating
large image regions from sample textures, and (ii) "inpainting"
techniques for filling in *small* image gaps. The former
work well for "textures" -- repeating two-dimensional patterns
with some stochasticity; the latter focus on linear "structures"
which can be thought of as one-dimensional patterns, such as lines
and object contours.

This paper presents a novel and efficient algorithm that combines the advantages of these two approaches. We first note that exemplar-based texture synthesis contains the essential process required to replicate both texture and structure; the success of structure propagation, however, is highly dependent on the order in which the filling proceeds. We propose a best-first algorithm in which the confidence in the synthesized pixel values is propagated in a manner similar to the propagation of information in inpainting. The actual colour values are computed using exemplar-based synthesis. Computational efficiency is achieved by a block-based sampling process.

A number of examples on real and synthetic images demonstrate the effectiveness of our algorithm in removing large occluding objects as well as thin scratches. Robustness with respect to the shape of the manually selected target region is also demonstrated. Our results compare favorably to those obtained by existing techniques.

I implemented the entire algorithm from the paper. A
pseudo-code description of the algorithm (from the paper) follows.
Let *R* represent the region to be filled, *I* the
entire image, *S=I-R* the source region from which candidate
exemplars are chosen, *P(p)* the priority of a pixel,
*C(p)* the confidence term for pixel *p*, *D(p)*
the data term for pixel *p*, and *t* the iteration:

- Extract the manually selected initial front
*dR*.^{0} - Repeat until done:
- Identify the fill front
*dR*. If^{t}*dR*= {}, exit.^{t} - Compute priorities
*P(p) = C(p)*D(p)*, \forall*p*\in*dR*.^{t} - Find the patch
*Hp*with maximum priority,*i.e.**Hp*|*p*= argmax_{{p \in dRt}}*P(p)*. - Find the exemplar
*Hq*\in*S*that minimizes the sum squared error (SSE). - Copy image data from
*Hq*to*Hp*. - Update
*C(p)*\forall*p*|*p*\in (*Hp*\intersect*R*).

- Identify the fill front

The main contribution of this paper is the
priority/patch-ordering mechanism that allows an exemplar-based
approach to respect the structural features of the input image.
The priority is composed of a *confidence term, C(p),* and a
*data term, D(p),* both defined over pixels:

*C(p) = ( Sum*_{{q \in Hp \intersect ~R}}C(q) ) / ( area of patch )*D(p) = abs( Isophote(p) \dot Normal(p) ) / \alpha*

Intuitively, the confidence term measures how sure a pixel is
of its own value; this is computed from the confidence of
surrounding pixels that have already been filled (or weren't in
the fill region to begin with). Confidence tends to decay as the
center of the fill region is approached. Because of this, if the
priority only consisted of the confidence term, the patches would
be selected in an "onion-peel" manner, which is typical of current
exemplar-based approaches. Confidence ignores structural
information in the image, however. This is why the data term is
necessary; it is a combination of how strongly an *isophote*
at a pixel collides with the contour at that same pixel. An
isophote is basically the gradient at a pixel rotated by 90
degrees---it captures the "strength of flow" of an edge. If only
the data term is used in the priority, however, edges end up
propagating where they shouldn't. It is the harmony between the
two factors that creates good results. In this vein, both
quantities are normalized (to lie between 0 and 1) by appropriate
factors. Figure 3 in the paper diagrams the quantities.

I did not use the CIE Lab colorspace as they did in the paper
because the choice of colorspace is orthogonal to the algorithm.
The CIE Lab colorspace *does* have better perceptual
uniformity than the RGB colorspace, as the authors point out,
which is why they use it in their implementation.

I developed the algorithm in Matlab and coded a critical portion
in C (*i.e.* a Matlab MEX-file). The code is small and easy to
use: it is contained in one M-file and one C file, totaling 188
lines of code and 104 lines containing semicolons ("logical" lines
of code). For usage notes, see `README.txt`.

In short, the results turned out quite well. The only major discrepancy was in the execution speed between my implementation and the authors'. All experiments were run on a 2GHz Pentium 4 laptop with 256MB of RAM. All results presented in this report are mine alone. In the images for the confidence and data terms, blue = 0, red = 1. Values in between are scaled to the appropriate color.

I created a 213x284 synthetic image for testing and sanity checking. My implementation took 25 seconds to inpaint this image. It correctly painted in the missing parts of the synthetic image. The interesting thing to note about this example is the presence of the red dots in the data term; they lie along the hard edge between the black and gray regions. Because of the presence of such a strong edge, the pixels along the edge get a high priority; therefore, those patches get filled first. Here is a movie showing the region-filling for the synthetic image (.avi files): compressed, 52KB, uncompressed, ~14MB.

For a natural image to work on, I used the photo of a bungee
jumper from the paper. This allows me to compare my implementation
with the authors'. My implementation takes 114 seconds to inpaint
this 206x308 image; theirs, 18 seconds. The previous
state-of-the-art (Harrison's resynthesizer), takes 10 minutes, as
they report. Their experiments, however, were run on on a "2.5GHz
Pentium IV with 1GB of RAM", so it's not quite a fair comparison.
Furthermore, 57% of of my execution time is spent in one routine,
`bestexemplar` (even *after* coding its critical portion
in C!), which is time-intesive because it needs to do a SSE
calculation *for each patch (sliding window) in the source
region*. It is possible they have a smarter implementation of
that subroutine, but it isn't mentioned in the paper. Another 20%
is wasted in `getpatch` which merely does some indexing
calculations. Also, the authors don't mention what language they
used, but I suspect it was either C or C++.

Here is a movie showing the region-filling for the natural image (.avi files): compressed, ~1MB, uncompressed, ~60MB.

Again, the data term reveals something about how the linear structure is preserved. At two points in the jumpers's torso, the data term becomes larger, along the two edges created by the roof of the building in the background (it's hard to see; look for the red dots).

Last modified: Thu Dec 2 11:53:06 EST 2004