Topic in Machine Vision and Learning
Marc S. Johannes
"Eigenfaces"
September 19, 2000

 
 

"EigenFaces"
 
 

Introduction:  To combat issues of complexity due to extrememely high dimensional data, methods have evolved to reduce the dimensionality of some "space" of interest.  A particular interesting method is Principal Component Analysis (PCA) , a method that takes N vectors in some multi (M)-dimensional space and constructs N orthonormal basis vectors that span the same subspace of M spanned by the original N vectors.  Essentially decomposing the N vectors into its orthogonal components.  A vector in that subspace can then be described as a linear combination of basis vectors, and can be approximated arbitrarily close by n < N of those basis vectors.  This experiment explores the efficacy of describing a set of face images in terms of some smaller set of its principle components.
 

Part I





Part I of this experiment utilizes a database of 100 segmented face images (Figure A illustrates examples from this database).  The objective is to decompose this set of images into its principal components and use this set of basis vectors to describe a sub-space or "face-space."  The notion is that this set of vectors should not only described the faces used to generate the sub-space, but also be able to satisfactorily approximate other images that are part of the "face-space", i.e., segmented, oriented and scaled images of a face.  The problems of figure-ground segmentation, pose-estimation, scale-space and other tasks in forming a canonical face image are not examined here.  The assumption is that these tasks are taken care of and for this experiment exist as another set of 12 test images separate from the 100 training images.  The following code snippet reads in face images, downsamples them appropriately and returns a matrix whose columns represent the N catenated images readImages().

To calculate the principal components an algorithm known as Singular Value Decomposition (SVD) is utilized.  Before running SVD, the first moment of the image set is subtracted off each image (as it would simply manifest itself again in the first eigenvector, or most significant basis vector).  The following code runs SVD and returns the orthonormal basis vectors and the mean face,  getBasis().

Sample Training Images


Figure A





The matrix returned from getBasis() contains the orthonormal basis vectors constructed from the training images.  The function also returns a diagonal matrix that corresponds to the singular values and are related to the eigenvalues of the eigenvectors of the "face-space".  These singular values are an indication of the importance the corresponding basis vector has in accounting for the variation in the training set.  In other words, it describes the "impact" each basis vector will have on the reconstruction of an arbitrary face image.  To form an intuition it is helpful to plot these singular values.  The Figure below is a plot of the N (100) diagonal elements.  It is interesting to the note that the drop-off is exponential.  This seems to lead to the implication that perhaps only the first bunch of basis vectors are significant, or at the least you get a whole lot of bang out of the first handful.
 
 

Figure 1



The plot in Figure 2 is a different look at this same information.  This plot is the cumulative variance.  Notice now how the exponential is "hurrying" along for the first handful of values, indicating that a large portion of the variance in the training images is accounted for in the most significant basis vectors.  Figure 3, is simply the same plot normalized to indicate the percentage of the variance accounted for in the first n basis vectors.  Naturally, it has the same curve as Figure 2.  From this plot it is is simple to see that we will need the first 70 basis vectors to account for 95% of the variance in the training set.  These 70 orthonormal vectors are our reduced-space representation.
 
 

Figure 2

Figure 3






The set of orthonormal vectors span a sub-space named "face-space" and are subsequently called eigenfaces.  These eigenfaces can be represented as an image.  The figure below illustrates the first 70 basis vectors or eigenfaces viewed as an image.








Figure 4



This set of eigenfaces is the intended low dimension representation of face-space.  They will be used to approximate a reconstruction of arbitrary face images that were not used in the construction of the eigenfaces but still hopefully cluster near face-space.  The method to perform this reconstruction is to project a test image into face-space, achieved by taking the inner product of the test image vector with each basis vector, to construct a weight vector that will be utilized to reform the image by forming a linear combination of the eigenfaces using the weight vectors as the coefficients.

The above set of basis vectors were used to reconstruct a set of test images by projecting a test case into "eigenspace" or onto the basis vectors to derive a serious of weights that when multiplied by the corresponding basis vector and summed over the entire set of basis vectors approximates the test image.  The test images were not used in the calculation of the basis vectors.  The following code projects a image into face space and reconstructs the image, respectively.

projectImage()
reconstruct()
 
 

Figure 5 and 6 illustrate image reconstructions and difference images using an image taken from the set of training images using all 100 basis vectors and the first 70 respectively.  Figure 7 and 8 illustrate this same idea but this time using a test image that did not appear in the training set.  Following this, Figure 9 shows reconstructions of all the test images using the first 70 basis vectors.  Finally, Figure 10 shows reconstructions using only 25 basis vectors which accounts for approximately 75% of the variance.

An interesting thing to note is the huge perceptual difference between the reconstructions in Figure 5 and Figure 6.  This suggests that while the last 30 basis vectors may not play a huge statistical role, they do play very significant perceptual role.  Another item of interest is the difference images.  They look like the original faces!  This is an indication that the big errors are not arbitrary but are being made exactly at places that are perceptually salient (High frequency? curvature? edges?).  I will discuss this later as I am not sure whether this means anything or not.

Original, Reconstruction and Difference of a Training Image using all 100 basis vectors

Figure 5
 
 
 

Original, Reconstruction and Difference of a Training Image using first 70 basis vectors

Figure 6
 
 

Original, Reconstruction and Difference of Test Image using all 100 basis vectors

Figure 7

Original, Reconstruction and Difference of Test Image using all 70 basis vectors

Figure 8
 
 

Reconstructions of Test Images using the first 70 basis vectors






Figure 9
 
 



Figure 10
 
 
 
 
 

Figure 11 is a plot of the sum of squared errors between the 12 test images and their reconstructions.

Figure 11
 

PART II




Part II attempts to examine two other properties of eigenspace representation, interpolation and occlusion.

The first property is to empirically test whether the simple notion of interpolation has any meaning in face-space, i.e., does interpolation in face-space give rise to a visually meaningful interpolation in image space.  Two sets of two images were taken and simple linear interpolation of 5 steps was used to transition from the first image in each set to the second.  Figure 12 and 13 illustrate this step.  The same two sets of images are used for interpolation in face space.  The images are projected into face-space to retrieve the coefficients of the linear combination of each projection.  These weights are then used as the interpolation vehicle, and once again simple linear interpolation is used to transition from the basis vector projection weights of the first image of the set into the weights of the second.  Each step the interpolated weight vector is reconstructed into an interpolated image.  By this method interpolation is handled in face space.  This procedure was done using all 100 basis vectors and the first 70 vectors for reconstruction.  Figure 14,15,16,17 illustrate interpolation in face space.  The results indicate that indeed, an interpolation of weights in face space does correspond to a visually meaningful transition in face space.

The following code does the interpolation:

interpImage()
interpWeights()

Interpolation Results In Image Space

Figure 12: Set 1

Figure 13: Set 2

Interpolation Results in Face Space

Figure 14: Set 1 (First 70 basis vectors)
 

Figure 15: Set 1 (All 100 basis vectors)

Figure 16: Set 2 (First 70 basis vectors)
 
 

Figure 17: Set 2 (All 100 basis vectors)



The second property of consideration is occlusion and how well the eigenspace representation holds up under differing levels of occlusion.  The procedure here is to synthetically add an occluder of various size and position to an image and then reconstruct the image by projecting into the face-space and reconstructing it.  The reconstructions should give some insight into how PCA handles occlusion.  A discussion of the results follows the occlusion figures.

Occlusion Results

Figure 18

Figure 19

Figure 20

Figure 21

Figure 22

Figure 23

Figure 24



These results actually say quite a few things.  Intuition suggests that the larger the occluder the worse the reconstruction, which appears to be true, but what "features" are occluded is another critical influence on the quality of the reconstruction.  First an explanation of the Figures.  The first image in each series is the original image with the occluder (the first image has an occluder with zero area).  The second image is the reconstruction.  The third image is the difference between the reconstruction and the original image, but since the original image has an occluder, this image isn't as meaningful as the fourth image which is the difference between the reconstruction for an occluded face with the reconstruction of the same face without the occluder (thus, the fourth image in the first set is zero).  The plots below (Figure 25 and 26) show the sum of squared errors of the difference images (3rd and 4th images in each set above respectively).  To avoid massive amounts of scrolling, I have added a table below Figure 26 to correlate the numbers with the type of occluder the error value is representing.   One does expect, of course, to get lips back even when no lips appear in the occluded image.  Even if it does happen that all basis vectors with "lip-like" information get weighted zero (highly unlikely and perhaps non-sensical as it becomes clear that this is not how the information is organized as it settles onto the basis vectors) the mean image gets added on during the reconstruction stage.
 

There are two categories of differences in the reconstructions to consider here.  The first is the raw sum of squared errors and the other is the perceptual difference.  The first is illustrated objectively in Figure 26 and the second is a subjective matter that doesn't necessarily agree.  Observation suggests to me that the size of the occluder is the ONLY significant factor in determining the statististical difference, but what is occluded comes into play when determining the perceptual difference.  This is best illustrated by the 2nd and 7th image.  The size of the occluder is roughly the same size and so is the sum of squared error (against the reconstruction without an occluder), yet the perceptual difference is much greater in the image with the occluder covering the eyes (salient perceptual features).  A similar comparison can be made using the occluder covering both eyes.  A big personal suprise here is the results from the forehead occluder.  I would expect a big statistical difference (true) and a much smaller perceptual distance (false) - the reconstruction is unrecognizable.  What seems to be responsible for this is the fact that nearby features, namely the eyes, seem to be effected by the occlusion of the neighborhood pixels.  This actually gives some insight into how changes in image space effect face space.  Since their is no notion of "features" and initial vectors are constructed based on raster order only, non-occluded regions get warped by virtue of being near occluders.  Occluders have a gravitational pull in face space, so to speak.  Also note that the difference image (now speaking of the difference between reconstruction and original, 3rd image in the series) no longer looks like the original.  Errors no longer accumulate along perceptually salient 'events'.
 
 

Figure 25
 

Figure 26

1) No occluder
2) One eye and upper nose
3) Mouth
4) Both eyes and upper nose
5) All facial "features"
6) Forehead
7) Left Cheek
 
 


PART III



Part III of this experiment examines the applicability to using PCA as a representation of general images and not different images of some a priori known set, such as faces.  Subsequently, the reduction of dimensionality creates not a "face-space", or a "animal-space", but a general sub-space or eigenspace.  The set of images used in Part III are separated into three sets; 16x16, 32x32 and 64x64.  The images are by and large snippets of textures, scenes and cropped objects.  The same procedure as detailed in the first part of the lab was repeated on each set of images.  The orthonormal basis vectors were constructed from running SVD on all the training images.  Figure 28, 30 and 32 are image representations of the first 9 basis vectors of each set.  The first thing one notes is that the basis vectors are just textures, which is expected from a training set that is highly uncorrelated.  Despite the uncorrelated training data, the singular values still have the characteristic exponential decay.  However, it should be noted that the decay is much more rapid than in the case of the face training set.  This should not be immediately be attributed to differences in correlation as the 64x64 set of images shows a slower decay then the 16x16 set and by observation are equally uncorrelated.  It appears that the rapid decay is more a result of image size.  The small size of the 16x16 images makes for a much smaller dimensional space and therefore a restricted variance that can be adequately accounted for by fewer basis vectors.

These images together don't really define a clustered space.  Meaning that there is no real significance in being near the sub-space spanned by the basis vectors formed from these training images.  In this sense I don't think that PCA is a very effective means of representing this set.  In fact, without an application I have a hard time defending any sort of alternative representation for this set of images as a good one.  Perhaps I am missing something vital here.

Texture Image Results

Figure 27
 

First 9 Basis Vectors

Figure 28
 


Figure 29
 
 


Figure 30
 
 


Figure 31
 
 


Figure 32






Conclusion:  Certainly PCA has its place in reducing dimensionality, and is quite a work horse for certain tasks.  Recognition might not be one of them, categorization might be. It appears that this is a good method to represent a set of images, or space occupied by a certain type of image. An interesting point that John Hughes made was the difference between how well a certain method does a task and how well it replicates how the human visual system does that task. It is clear that PCA is of the former distinction and its true benefits lie in using it to complete a particular task. Visually examining reconstructions based on PCA, as in the first part is misleading because the human percept is so intimately involved (especially with faces). However this experiment was immensely helpful in elucidating a technique that was previously unfamiliar to me

Other MATLAB code:

modelFaces()
occludeRegion()