![]() |
Mini-Project #2: Matching Cartoon Characters using Active Shape Model Techniquesby Delphine NainSupervisor: Frank Dellaert |
![]() |
Note: All the images on this website may be subject to copyright.
The goal of this project was to explore and implement a technique called "Active Shape Models" described in the paper "Training Models of Shape from Sets of Examples" from T.F Cootes et al. (1992). We applied this technique to train on a set of examples from different poses of a cartoon character (Tintin from Herge). We then built a GUI that enables the user to introduce a picture of any character and retrieve the closest matching Tintin shape in our database.
The first step to model each shape is to represent it by a set of points. It is important to label points consistently across shapes (i.e the second point always corresponds to the center of the left knee).
Our algorithm aligns all the shapes to the "mean shape" (page 3-4 and Appendix A) and then does PCA to find the major modes of variations (pages 5-6).
By doing PCA, we obtain a matrix of major eigenvectors, P, as well as a vector that describes the "mean shape", xbar. As described in the paper (equation 12), we characterize each shape by its "weight vector" b. These will be used for the shape comparison in step 2.
Figure 1 shows the "mean" shape from our database of Tintin images.
Figure 3 shows the 19 points that we chose to describe Tintin across all examples. The red points are initial points clicked by the user and the green points are an alignment of the red points with the "mean shape".
![]() |
![]() |
We obtained very compelling "modes of variations" for our database:
It is important to note that there is symmetry among the spectrum of most modes, however for mode 4, there is no symmetry because there are no images of tintin running to the left in the database.
One other important point is the choice of correspondance points. In the examples in the paper, all shapes are almost "alike", so it is easy to choose correspondence points that can be uniquely found in all shapes. For the case of a human figure, it is harder to choose exactly the same correspondence points in all images, since Tintin is wearing clothes and sometimes parts of the body are occluded. It is our experience that in this case it is better to choose the most concise set of correspondence points so that there is less room for mistakes when clicking on the points in all shapes. We chose to pick points that would lie on the "skeleton". It also makes the process of matching shapes easier since correspondance points can more easily be found between a skeleton and stick figures.
The input has to be labelled in the same way than the shapes in the database. In our GUI, we present the input image side-by-side with the image of the mean shape labelled with the correspondance points to help the user know which points to click on the input and in what order (see Figure 4).
Once the points are collected from the input, the input shape is aligned with the "mean shape" to be able to compare it to all the aligned shapes in the database. We then calculate the deviation from the mean of the input, and project the result on the eigenvector basis P, found in step 1 to extract the "weight vector" b (described in step 1). We can then compare the weight vector b to all the other weight vectors in the database and find which one is the closest match (by doing euclidian distance for example). We then present to the user the image that corresponds to the closest shape found.
The following are series of input/output images from the algorithm that illustrate interesting properties.
In general, we realized that the training set is very important. Since there are many examples of Tintin with different arm poses, the program is failry robust at recognizing bent arms, and whether it is the right one of left one (Figure 3 and 4 and 5). The program also recognizes well if the character is facing front (most Figures) or back (Figure 7) since this is the major mode of variation (that is as long as the user keeps the convention of labelling points in the right order, from the left part of the body to the right part). It also recognizes fairly well the difference between a walking or a running character.
We found however that there were not enough "sitting" Tintins in our dataset for the program to be robust to finding sitting poses.
In general, one idea for improvement is to skip the rotation alignment to be able to recognize horizontal or rotated characters (right now a sleeping stick character will be rotated vertically during the normalization-to-the-mean phase and therefore will be matched to a standing character).
The main inconvenience with the method is the manual picking of correspondence points. The process could be automated with a skeleton matching algorithm for example. But apart from this inconvenience, the method produces compelling modes of variations and fairly robust results for shape matching, given that the correspondence points and the training set are well chosen.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |