"EigenFaces"
Introduction: To combat issues of complexity due to extrememely
high dimensional data, methods have evolved to reduce the dimensionality
of some "space" of interest. A particular interesting method is Principal
Component Analysis (PCA) , a method that takes N vectors in some multi
(M)-dimensional space and constructs N orthonormal basis vectors that span
the same subspace of M spanned by the original N vectors. Essentially
decomposing the N vectors into its orthogonal components. A vector
in that subspace can then be described as a linear combination of basis
vectors, and can be approximated arbitrarily close by n < N of those
basis vectors. This experiment explores the efficacy of describing
a set of face images in terms of some smaller set of its principle components.
Part I
Part I of this experiment utilizes a database of 100 segmented face images (Figure A illustrates examples from this database). The objective is to decompose this set of images into its principal components and use this set of basis vectors to describe a sub-space or "face-space." The notion is that this set of vectors should not only described the faces used to generate the sub-space, but also be able to satisfactorily approximate other images that are part of the "face-space", i.e., segmented, oriented and scaled images of a face. The problems of figure-ground segmentation, pose-estimation, scale-space and other tasks in forming a canonical face image are not examined here. The assumption is that these tasks are taken care of and for this experiment exist as another set of 12 test images separate from the 100 training images. The following code snippet reads in face images, downsamples them appropriately and returns a matrix whose columns represent the N catenated images readImages().
To calculate the principal components an algorithm known as Singular Value Decomposition (SVD) is utilized. Before running SVD, the first moment of the image set is subtracted off each image (as it would simply manifest itself again in the first eigenvector, or most significant basis vector). The following code runs SVD and returns the orthonormal basis vectors and the mean face, getBasis().
Sample Training Images
Figure A
The matrix returned from getBasis() contains
the orthonormal basis vectors constructed from the training images.
The function also returns a diagonal matrix that corresponds to the singular
values and are related to the eigenvalues of the eigenvectors of the "face-space".
These singular values are an indication of the importance the corresponding
basis vector has in accounting for the variation in the training set.
In other words, it describes the "impact" each basis vector will have on
the reconstruction of an arbitrary face image. To form an intuition
it is helpful to plot these singular values. The Figure below is
a plot of the N (100) diagonal elements. It is interesting to the
note that the drop-off is exponential. This seems to lead to the
implication that perhaps only the first bunch of basis vectors are significant,
or at the least you get a whole lot of bang out of the first handful.
Figure 1
The plot in Figure 2 is a different look at this same information.
This plot is the cumulative variance. Notice now how the exponential
is "hurrying" along for the first handful of values, indicating that a
large portion of the variance in the training images is accounted for in
the most significant basis vectors. Figure 3, is simply the same
plot normalized to indicate the percentage of the variance accounted for
in the first n basis vectors. Naturally, it has the same curve as
Figure 2. From this plot it is is simple to see that we will need
the first 70 basis vectors to account for 95% of the variance in the training
set. These 70 orthonormal vectors are our reduced-space representation.
Figure 2
Figure 3
The set of orthonormal vectors span a sub-space named "face-space" and are subsequently called eigenfaces. These eigenfaces can be represented as an image. The figure below illustrates the first 70 basis vectors or eigenfaces viewed as an image.
Figure 4
This set of eigenfaces is the intended low dimension representation of face-space. They will be used to approximate a reconstruction of arbitrary face images that were not used in the construction of the eigenfaces but still hopefully cluster near face-space. The method to perform this reconstruction is to project a test image into face-space, achieved by taking the inner product of the test image vector with each basis vector, to construct a weight vector that will be utilized to reform the image by forming a linear combination of the eigenfaces using the weight vectors as the coefficients.
The above set of basis vectors were used to reconstruct a set of test images by projecting a test case into "eigenspace" or onto the basis vectors to derive a serious of weights that when multiplied by the corresponding basis vector and summed over the entire set of basis vectors approximates the test image. The test images were not used in the calculation of the basis vectors. The following code projects a image into face space and reconstructs the image, respectively.
Figure 5 and 6 illustrate image reconstructions and difference images using an image taken from the set of training images using all 100 basis vectors and the first 70 respectively. Figure 7 and 8 illustrate this same idea but this time using a test image that did not appear in the training set. Following this, Figure 9 shows reconstructions of all the test images using the first 70 basis vectors. Finally, Figure 10 shows reconstructions using only 25 basis vectors which accounts for approximately 75% of the variance.
An interesting thing to note is the huge perceptual difference between the reconstructions in Figure 5 and Figure 6. This suggests that while the last 30 basis vectors may not play a huge statistical role, they do play very significant perceptual role. Another item of interest is the difference images. They look like the original faces! This is an indication that the big errors are not arbitrary but are being made exactly at places that are perceptually salient (High frequency? curvature? edges?). I will discuss this later as I am not sure whether this means anything or not.
Original, Reconstruction and Difference of a Training Image using all 100 basis vectors
Figure 5
Original, Reconstruction and Difference of a Training Image using first 70 basis vectors
Figure 6
Original, Reconstruction and Difference of Test Image using all 100 basis vectors
Figure 7
Original, Reconstruction and Difference of Test Image using all 70 basis vectors
Figure 8
Reconstructions of Test Images using the first 70 basis vectors
Figure 9
Figure 10
Figure 11 is a plot of the sum of squared errors between the 12 test images and their reconstructions.
Figure 11
PART II
Part II attempts to examine two other properties of eigenspace representation, interpolation and occlusion.
The first property is to empirically test whether the simple notion of interpolation has any meaning in face-space, i.e., does interpolation in face-space give rise to a visually meaningful interpolation in image space. Two sets of two images were taken and simple linear interpolation of 5 steps was used to transition from the first image in each set to the second. Figure 12 and 13 illustrate this step. The same two sets of images are used for interpolation in face space. The images are projected into face-space to retrieve the coefficients of the linear combination of each projection. These weights are then used as the interpolation vehicle, and once again simple linear interpolation is used to transition from the basis vector projection weights of the first image of the set into the weights of the second. Each step the interpolated weight vector is reconstructed into an interpolated image. By this method interpolation is handled in face space. This procedure was done using all 100 basis vectors and the first 70 vectors for reconstruction. Figure 14,15,16,17 illustrate interpolation in face space. The results indicate that indeed, an interpolation of weights in face space does correspond to a visually meaningful transition in face space.
The following code does the interpolation:
Interpolation Results In Image Space
Figure 12: Set 1
Figure 13: Set 2
Interpolation Results in Face Space
Figure 14: Set 1 (First 70 basis vectors)
Figure 15: Set 1 (All 100 basis vectors)
Figure 16: Set 2 (First 70 basis vectors)
Figure 17: Set 2 (All 100 basis vectors)
The second property of consideration is occlusion and how well the eigenspace representation holds up under differing levels of occlusion. The procedure here is to synthetically add an occluder of various size and position to an image and then reconstruct the image by projecting into the face-space and reconstructing it. The reconstructions should give some insight into how PCA handles occlusion. A discussion of the results follows the occlusion figures.
Occlusion Results
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
These results actually say quite a few things. Intuition suggests
that the larger the occluder the worse the reconstruction, which appears
to be true, but what "features" are occluded is another critical influence
on the quality of the reconstruction. First an explanation of the
Figures. The first image in each series is the original image with
the occluder (the first image has an occluder with zero area). The
second image is the reconstruction. The third image is the difference
between the reconstruction and the original image, but since the original
image has an occluder, this image isn't as meaningful as the fourth image
which is the difference between the reconstruction for an occluded face
with the reconstruction of the same face without the occluder (thus, the
fourth image in the first set is zero). The plots below (Figure 25
and 26) show the sum of squared errors of the difference images (3rd and
4th images in each set above respectively). To avoid massive amounts
of scrolling, I have added a table below Figure 26 to correlate the numbers
with the type of occluder the error value is representing.
One does expect, of course, to get lips back even when no lips appear in
the occluded image. Even if it does happen that all basis vectors
with "lip-like" information get weighted zero (highly unlikely and perhaps
non-sensical as it becomes clear that this is not how the information is
organized as it settles onto the basis vectors) the mean image gets added
on during the reconstruction stage.
There are two categories of differences in the reconstructions to
consider here. The first is the raw sum of squared errors and the
other is the perceptual difference. The first is illustrated objectively
in Figure 26 and the second is a subjective matter that doesn't necessarily
agree. Observation suggests to me that the size of the occluder is
the ONLY significant factor in determining the statististical difference,
but what is occluded comes into play when determining the perceptual difference.
This is best illustrated by the 2nd and 7th image. The size of the
occluder is roughly the same size and so is the sum of squared error (against
the reconstruction without an occluder), yet the perceptual difference
is much greater in the image with the occluder covering the eyes (salient
perceptual features). A similar comparison can be made using the
occluder covering both eyes. A big personal suprise here is the results
from the forehead occluder. I would expect a big statistical difference
(true) and a much smaller perceptual distance (false) - the reconstruction
is unrecognizable. What seems to be responsible for this is the fact
that nearby features, namely the eyes, seem to be effected by the occlusion
of the neighborhood pixels. This actually gives some insight into
how changes in image space effect face space. Since their is no notion
of "features" and initial vectors are constructed based on raster order
only, non-occluded regions get warped by virtue of being near occluders.
Occluders have a gravitational pull in face space, so to speak. Also
note that the difference image (now speaking of the difference between
reconstruction and original, 3rd image in the series) no longer looks like
the original. Errors no longer accumulate along perceptually salient
'events'.
Figure 25
Figure 26
1) No occluder
2) One eye and upper nose
3) Mouth
4) Both eyes and upper nose
5) All facial "features"
6) Forehead
7) Left Cheek
PART III
Part III of this experiment examines the applicability to using PCA as a representation of general images and not different images of some a priori known set, such as faces. Subsequently, the reduction of dimensionality creates not a "face-space", or a "animal-space", but a general sub-space or eigenspace. The set of images used in Part III are separated into three sets; 16x16, 32x32 and 64x64. The images are by and large snippets of textures, scenes and cropped objects. The same procedure as detailed in the first part of the lab was repeated on each set of images. The orthonormal basis vectors were constructed from running SVD on all the training images. Figure 28, 30 and 32 are image representations of the first 9 basis vectors of each set. The first thing one notes is that the basis vectors are just textures, which is expected from a training set that is highly uncorrelated. Despite the uncorrelated training data, the singular values still have the characteristic exponential decay. However, it should be noted that the decay is much more rapid than in the case of the face training set. This should not be immediately be attributed to differences in correlation as the 64x64 set of images shows a slower decay then the 16x16 set and by observation are equally uncorrelated. It appears that the rapid decay is more a result of image size. The small size of the 16x16 images makes for a much smaller dimensional space and therefore a restricted variance that can be adequately accounted for by fewer basis vectors.
These images together don't really define a clustered space. Meaning that there is no real significance in being near the sub-space spanned by the basis vectors formed from these training images. In this sense I don't think that PCA is a very effective means of representing this set. In fact, without an application I have a hard time defending any sort of alternative representation for this set of images as a good one. Perhaps I am missing something vital here.
Texture Image Results
Figure 27
First 9 Basis Vectors
Figure 28
Figure 29
Figure 30
Figure 31
Figure 32
Conclusion: Certainly PCA has its place in reducing dimensionality, and is quite a work horse for certain tasks. Recognition might not be one of them, categorization might be. It appears that this is a good method to represent a set of images, or space occupied by a certain type of image. An interesting point that John Hughes made was the difference between how well a certain method does a task and how well it replicates how the human visual system does that task. It is clear that PCA is of the former distinction and its true benefits lie in using it to complete a particular task. Visually examining reconstructions based on PCA, as in the first part is misleading because the human percept is so intimately involved (especially with faces). However this experiment was immensely helpful in elucidating a technique that was previously unfamiliar to me
Other MATLAB code: