EN161 Image Understanding Projects

 

All projects will entail careful reading and understanding 1-2 main papers and reading several other supplementary papers as the foundation to enable you to implement and test a current method in your chosen topic.  You will be expected to be able to discuss the strengths and weaknesses of the method.

 

Contact TA: MingChing Chang

 

Projects with are highly recommended for this course.

 

If you have problem downloading the SpringerLink papers, you will probably see this warning message:

You are logged in as a 'Multiple User' of 'Brown University'.
Your institution's MetaPress ID is '554-55-063'.

It's just a warning message. Follow these steps to download it:

1. Make sure you connect to it using Brown's network.

2. goto: http://springerlink.metapress.com/

3. Search the paper in this page and you should be able to download it.


1. Region-Based Tracking

 

Spectral Solution of Large-Scale Extrinsic Camera Calibration as a Graph Embedding Problem

Matthew Brand, Antone, M.and Teller, S.

ECCV 04 II 262-273

 

Extrinsic calibration of large-scale ad hoc networks of cameras is posed as the following problem: Calculate the locations of N mobile, rotationally aligned cameras distributed over an urban region, subsets of which view some common environmental features. We show that this leads to a novel class of graph embedding problems that admit closed-form solutions in linear time via partial spectral decomposition of a quadratic form. The minimum squared error (MSE)solution determines locations of cameras and/or features in any number of dimensions. The spectrum also indicates insufficiently constrained problems, which can be decomposed into well-contrained rigid subproblems and analyzed to determine useful new views for missing constraints. We deomonstrate the method with large networks of mobile cameras distributed over an urban environment, using directional constraints that have been extracted automatically from commonly viewed features. Spectral solutions yield layouts that are consistent in some cases to a fraction of a millimeter, substantially improving the state of the art. Global laybout of large camera networks can be computed in a fraction of a second.

 

Abstract

Paper

 


2. Inpainting

 

A Combined PDE and Texture Synthesis Approach to Inpainting

Herald Grossauer

ECCV04 II 214-224

 

While there is a vast amount of literature considering PDE based inpainting and inpainting by texture synthesis, only a few publications are concerned with combination of both approaches. We present a novel algorithm which combines both approaches and treats each distinct region of the image separately. Thus we are naturally lead to include a segmentation pass as a new feature. This way the correct choice of texture samples for the texture synthesis is ensured. We propose a novel concept of local texture synthesis which gives satisfactory results even for large domains in a complex environment.

 

Abstract

Paper

 


3.

 

Weighted Minimal Hypersurfaces and Their Applications in Computer Vision

Bastain Goldlucke and Marcus Magnor ECCV04 II 366-378

 

Many interesting problems in computer vision can be formulated as a minimization problem for an energy functional. If this functional is given as an integral of a scalar-valued weight function over an unknown hypersurface, then the minimal surface we are looking for can be determined as a solution of the functionals Euler-Lagrange equation. This paper deals with a general class of weight functions that may depend on the surface point and normal. By making use of a mathematical tool called the method of the moving frame, we are able to derive the Euler-Lagrange equation in arbitrary-dimensional space and without the need for any surface parameterization. Our work generalizes existing proofs, and we demonstrate that it yields the correct evolution equations for a variety of previous computer vision techniques which can be expressed in terms of our theoretical framework. In addition, problems involving minimal hypersurfaces in dimensions higher than three, which were previously impossible to solve in practice, can now be introduced and handled by generalized versions of existing algorithms. As one example, we sketch a novel idea how to reconstruct temporally coherent geometry from multiple video streams.

 

Abstract

Paper

 


4.

 

Texture Boundary Detection for Real-Time Tracking

Ali Shahrokni et al ECCV04 II 566-577

 

Most of the tracking techniques used to determine the pose of an object in a sequence rely on the fact that silhouettes can be extracted using relatively simple algorithms such as background subtraction or standard edge- and gradient-based techniques. However, in practice, this rarely is the case and these silhouette extraction methods can be very brittle. They tend to fail in the presence of highly textured objects and clutter, which produce too many irrelevant edges. In such situations, it is advantageous to detect texture boundaries instead. However, because texture segmentation techniques usually require computing statistics over image patches, they are more useful for detection in a single image than for tracking.

Alternatively, we can use all the assumptions that are applicable to our tracking problem to simplify the problem a bit. More precisely we can start from the estimated projection of a 3-D object model and performs a line search in the direction perpendicular to the projected edges. This allow us to compute the most probable location of a texture boundary on the search line to which we refer to as scanline. The main idea behind scanline texture boundary detection is illustrated in Figure1 where we which to find the point on the yellow lines for which the probability of texture crossing is maximum. This is expressed in terms of the product of the conditional probabilities of pixel sequences on both side of a given point along the scanline given an estimate of the texture model at both sides. This estimate can be updated as we are going through the scanline. This concept is formalized in detail in the subsequent sections and is based on the paper by Shahrokni et. al.[1].
 

Abstract

Paper

 


5.

 

A TV Flow Based Local Scale Measure for Texture Discrimination

Thomas Brox and Joachim Weickert ECCV04 II 578-590

 

We introduce a technique for measuring local scale, based on a special property of the so-called total variational (TV) flow. For TV flow, pixels change their value with a speed that is inversely proportional to the size of the region they belong to. Exploiting this property directly leads to a region based measure for scale that is well-suited for texture discrimination. Together with the image intensity and texture features computed from the second moment matrix, which measures the orientation of a texture, a sparse feature space of dimension 5 is obtained that covers the most important descriptors of a texture: magnitude, orientation, and scale. A demonstration of the performance of these features is given in the scope of texture segmentation.
Our research is partly funded by the project WE 2602/1-1 of the Deutsche Forschungsgemeinschaft (DFG). This is gratefully acknowledged. We also want to thank Mikaël Rousson and Rachid Deriche for many interesting discussions on texture segmentation.

 

Abstract

Paper

PowerPoint-Präsentation

 


6.

 

Interactive Image Segmentation Using an Adaptive GMMRF Model

A. Blake et al ECCV04 I 428-441

 

The problem of interactive foreground/background segmentation in still images is of great practical importance in image editing. The state of the art in interactive segmentation is probably represented by the graph cut algorithm of Boykov and Jolly (ICCV 2001). Its underlying model uses both colour and contrast information, together with a strong prior for region coherence. Estimation is performed by solving a graph cut problem for which very efficient algorithms have recently been developed. However the model depends on parameters which must be set by hand and the aim of this work is for those constants to be learned from image data.
First, a generative, probabilistic formulation of the model is set out in terms of a Gaussian Mixture Markov Random Field (GMMRF). Secondly, a pseudolikelihood algorithm is derived which jointly learns the colour mixture and coherence parameters for foreground and background respectively. Error rates for GMMRF segmentation are calculated throughout using a new image database, available on the web, with ground truth provided by a human segmenter. The graph cut algorithm, using the learned parameters, generates good object-segmentations with little interaction. However, pseudolikelihood learning proves to be frail, which limits the complexity of usable models, and hence also the achievable error rate.

 

Author's Page

Abstract

Paper: PDF

 

Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D Images

Yuri Boykov and Marie-Pierre Jolly

 

In this paper we describe a new technique for general purpose interactive segmentation of N-dimensional images. The user marks certain pixels as ``object'' or ``background'' to provide hard constraints for segmentation. Additional soft constraints incorporate both boundary and region information. Graph cuts are used to find the globally optimal segmentation of the N-dimensional image. The obtained solution gives the best balance of boundary and region properties among all segmentations satisfying the constraints. The topology of our segmentation is unrestricted and both ``object'' and ``background'' segments may consist of several isolated parts. Some experimental results are presented in the context of photo/video editing and medical image segmentation. We also demonstrate an interesting Gestalt example. A fast implementation of our segmentation method is possible via a new max-flow algorithm in PAMI'04.

 

Abstract

Paper

 


7.

 

Region-Based Segmentation on Evolving Surfaces with Application to 3D Reconstruction of Shape and Piecewise Constant Radiance

Hailin Jin, Anthony J. Yezzi, Stefano Soatto

ECCV04 114-125


We consider the problem of estimating the shape and radiance of a scene from a calibrated set of images under the assumption that the scene is Lambertian and its radiance is piecewise constant. We model the radiance segmentation explicitly using smooth curves on the surface that bound regions of constant radiance. We pose the scene reconstruction problem in a variational framework, where the unknowns are the surface, the radiance values and the segmenting curves. We propose an iterative procedure to minimize a global cost functional that combines geometric priors on both the surface and the curves with a data fitness score. We carry out the numerical implementation in the level set framework.


Keywords: variational methods, Mumford-Shah functional, image segmentation, multi-view stereo, level set methods, curve evolution on manifolds.

 

Abstract

Paper

http://www.vision.cs.ucla.edu/projects.html

 

Semi-supervised Statistical Region Refinement for Color Image Segmentation

Richard Nock and Frank Nielsen

Pattern Recognition, Elsevier Science, accepted, 2005

Author's Page

Author's Page II

 

Segmentation Given Partial Grouping Constraints

Stella X. Yu, Jianbo Shi PAMI Feb 2004 173-183

 

We consider data clustering problems where partial grouping is known a priori. We formulate such biased grouping problems as a constrained optimization problem, where structural properties of the data define the goodness of a grouping and partial grouping cues define the feasibility of a grouping. We enforce grouping smoothness and fairness on labeled data points so that sparse partial grouping information can be effectively propagated to the unlabeled data. Considering the normalized cuts criterion in particular, our formulation leads to a constrained eigenvalue problem. By generalizing the Rayleigh-Ritz theorem to projected matrices, we find the global optimum in the relaxed continuous domain by eigendecomposition, from which a near-global optimum to the discrete labeling problem can be obtained effectively. We apply our method to real image segmentation problems, where partial grouping priors can often be derived based on a crude spatial attentional map that binds places with common salient features or focuses on expected object locations. We demonstrate not only that it is possible to integrate both image structures and priors in a single grouping process, but also that objects can be segregated from the background without specific object knowledge.

 

Abstract


Color Texture Segmentation by Region-Boundary Cooperation

Jordi Freixenet, Xavier Muñoz, Joan Martí, Xavier Lladó

ECCV04 II 250-261

 

A colour texture segmentation method which unifies region and boundary information is presented in this paper. The fusion of several approaches which integrate both information sources allows us to exploit the benefits of each one. We propose a segmentation method which uses a coarse detection of the perceptual (colour and texture) edges of the image to adequately place and initialise a set of active regions. Colour texture of regions is modelled by the conjunction of non-parametric techniques of kernel density estimation, which allow to estimate the colour behaviour, and classical co-occurrence matrix based texture features. When the region information is defined, accurate boundary information can be extracted. Afterwards, regions concurrently compete for the image pixels in order to segment the whole image taking both information sources into account. In contrast with other approaches, our method achieves relevant results on images with regions with the same texture and different colour (as well as with regions with the same colour and different texture), demonstrating the performance of our proposal. Furthermore, the method has been quantitatively evaluated and compared on a set of mosaic images, and results on real images are shown and analysed.

 

Abstract

Abstract II

Paper

 

Geodesic Active Regions and Level Set Methods for Supervised Texture Segmentation

Nikos Paragios, Rachid Deriche IJCV02 223-247

 

This paper presents a novel variational framework to deal with frame partition problems in Computer Vision. This framework exploits boundary and region-based segmentation modules under a curve-based optimization objective function. The task of supervised texture segmentation is considered to demonstrate the potentials of the proposed framework. The textured feature space is generated by filtering the given textured images using isotropic and anisotropic filters, and analyzing their responses as multi-component conditional probability density functions. The texture segmentation is obtained by unifying region and boundary-based information as an improved Geodesic Active Contour Model. The defined objective function is minimized using a gradient-descent method where a level set approach is used to implement the obtained PDE. According to this PDE, the curve propagation towards the final solution is guided by boundary and region-based segmentation forces, and is constrained by a regularity force. The level set implementation is performed using a fast front propagation algorithm where topological changes are naturally handled. The performance of our method is demonstrated on a variety of synthetic and real textured frames.

 

Abstract

Paper

 


8. Recognition

 

Learning Chance Probability Functions for Shape Retrieval or Classification
Boaz J. Super CVPR04

 

Several example-based systems for shape retrieval and shape classification directly match input shapes to stored shapes, without using class membership information to perform the matching. We propose a method for improving the accuracy of this type of system. First, the system learns a set of chance probability functions (CPFs). The CPFs estimate the probabilities of obtaining a query shape with particular distances from each training example by chance. The learned CPFs are used at runtime to rapidly estimate the chance probabilities of the observed distances between the actual query shape and the database shapes. These estimated probabilities are then used as a dissimilarity measure for shape retrieval and/or nearest-neighbor classification. The CPF learning method is parameter-free. Experimental evaluation demonstrates that: (1) chance probabilities yield higher accuracy than Euclidean distances; (2) the learned CPFs support fast matching; and (3) the CPF-based system outperforms prior systems on a standard benchmark test of retrieval accuracy.

 

Abstract

Author's Page

Paper

 

BAS: a perceptual shape descriptor based on the beam angle statistics

Nafiz Arica, and Fatos T. Yarman Vural

Pattern Recognition Letters vol 24 Issue 9-10 1627-1639 June 2004

 

The proposed shape descriptor is based on the beams originated from a boundary point, which are defined as lines connecting that point with the rest of the points on the boundary. At each point, the angle between a pair of beams is calculated to extract the topological structure of the boundary. Then, a shape descriptor is defined by using the third-order statistics of all the beam angles in a set of neighborhood systems. It is shown that beam angle statistics (BAS) is invariant to translation, rotation, scale and is insensitive to distortions. Experiments are done on the dataset of MPEG 7 Core Experiments Shape-1. It is observed that BAS outperforms the MPEG 7 shape descriptors.

 

Abstract

Paper (through Brown Library)

 

(compare to curve matching)

 


9.

 

Learning the parts of objects with nonnegative matrix factorization

D. D. Lee and H. S. Seung, Nature 401, 788 (1999).

 

Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.

 

Abstract

Paper

 

Statistics of Nature Image Contours??

 


10. Curvature Estimation on Meshes/Point Clouds

 

Estimating Curvatures and Their Derivatives on Triangle Meshes

Szymon Rusinkiewicz 3DPVT04

 

The computation of curvature and other differential properties of surfaces is essential for many techniques in analysis and rendering. We present a finite-differences approach for estimating curvatures on irregular triangle meshes that may be thought of as an extension of a common method for estimating per-vertex normals. The technique is efficient in space and time, and results in significantly fewer outlier estimates while more broadly offering accuracy comparable to existing methods. It generalizes naturally to computing derivatives of curvature and higher-order surface differentials.

 

Abstract

Paper

 


11. Object Recognition based on Local Invariant Features

 

An Affine Invariant Interest Point Detector
K. Mikolajczyk and C. Schmid ECCV02 128-142

 

This paper presents a novel approach for detecting affine invariant interest points. Our method can deal with significant affine transformation including large scale changes. Such transformations introduce significant changes in the point location as well as in the scale and the shape of the neighbourhood of an interest point. Our approach allows to solve for these problem simultaneously. It is based on three key ideas: 1) The second moment matrix computed in a point can be used to normalize a region in an affine invariant way (skew and stretch). 2) The scale of the local structure is indicated by local extrema of normalized derivative over scale. 3) An affine-adapted Harris detector determines the location of interest points. A multi-scale version of this detector is used for initialization. An iterative algorithm then modifies location, scale and neighbourhood of each point and converges to a affine invariant points. For matching and recognition, the image is characterized by a set of a affine invariant points; the affine transformation associated with each point allows the computation of an affine invariant descriptor which is also invariant to affine illumination changes. A quantitative comparison of our detector with existing ones shows a significant improvement in the presence of large affine deformations. Experimental results for wide baseline matching show an excellent performance in the presence of large perspective transformations including significant scale changes. Results for recognition are very good for a database with more than 5000 images.

Keywords: Image features,matching,recognition.
 

Abstract

Author's Link

Paper Paper (.ps)

 

3D Object Modeling and Recognition Using Affine-Invariant Patches and Multi-View Spatial Constraints.
F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce.
CVPR 2003, II 272-277

 

This paper presents a representation for three-dimensional objects in terms of affine-invariant image patches and their spatial relationships. Multi-view constraints associated with groups of patches are combined with a normalized representation of their appearance to guide matching and reconstruction, allowing the acquisition of true three-dimensional affine and Euclidean models from multiple images and their recognition in a single photograph taken from an arbitrary viewpoint. The proposed approach does not require a separate segmentation stage and is applicable to cluttered scenes. Preliminary modeling and recognition results are presented.

 

Abstract

Author's Page

Paper (ps.gz)


Distinctive image features from scale invariant keypoints

D. Lowe, IJCV 2(60):91-110, 2004

 

This paper presents a method for extracting distinctive invariant features from images, which can be used to perform reliable matching between different images of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a a substantial range of affine distortion, addition of noise, change in 3D viewpoint, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

 

Author's Page

Paper

 

Simultaneous Object Recognition and Segmentation by Image Exploration

Vittorio Ferrari, Tinne Tuytelaars, Luc Van Gool, ECCV04 I 40-54

 

Methods based on local, viewpoint invariant features have proven capable of recognizing objects in spite of viewpoint changes, occlusion and clutter. However, these approaches fail when these factors are too strong, due to the limited repeatability and discriminative power of the features. As additional shortcomings, the objects need to be rigid and only their approximate location is found. We present a novel Object Recognition approach which overcomes these limitations. An initial set of feature correspondences is first generated. The method anchors on it and then gradually explores the surrounding area, trying to construct more and more matching features, increasingly farther from the initial ones. The resulting process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. Only very few correct initial matches suffice for reliable recognition. The experimental results demonstrate the stronger power of the presented method in dealing with extensive clutter, dominant occlusion, large scale and viewpoint changes. Moreover non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. The approach can extend any viewpoint invariant feature extractor.

 

Abstract

Author's Page

Paper

 

Wide baseline stereo based on local, affinely invariant regions

T. Tuytelaars, L. Van Gool, British Machine Vision Conf. 2000, pp. 412-422

http://citeseer.ist.psu.edu/context/1766120/0

 

****

An Affine Invariant Salient Region Detector

Timor Kadir, Andrew Zisserman, Michael Brady

ECCV (1) 2004: 228-241

 

In this paper we describe a novel technique for detecting salient regions in an image. The detector is a generalization to affine invariance of the method introduced by Kadir and Brady [10]. The detector deems a region salient if it exhibits unpredictability in both its attributes and its spatial scale.
The detector has significantly different properties to operators based on kernel convolution, and we examine three aspects of its behaviour: invariance to viewpoint change; insensitivity to image perturbations; and repeatability under intra-class variation. Previous work has, on the whole, concentrated on viewpoint invariance. A second contribution of this paper is to propose a performance test for evaluating the two other aspects.
We compare the performance of the saliency detector to other standard detectors including an affine invariance interest point detector. It is demonstrated that the saliency detector has comparable viewpoint invariance performance, but superior insensitivity to perturbations and intra-class variation performance for images of certain object classes.

 

Abstract

Author's Page

Paper

 


12.

 

Evaluation of Interest Point Detectors

Cordelia Schmid and Roger Mohr and Christian Bauckhage IJCV, 2000

 

Many different low-level feature detectors exist and it is widely agreed that the evaluation of detectors is important. In this paper we introduce two evaluation criteria for interest points repeatability rate and information content. Repeatability rate evaluates the geometric stability under different transformations. Information content measures the distinctiveness of features. Different interest point detectors are compared using these two criteria. We determine which detector gives the best results and show that it satisfies the criteria well.

 

Abstract

Paper

 


13.

 

Evaluation of Salient Point Techniques
N. Sebe Q. Tian E. Loupias M.S. Lew T.S 2002

http://citeseer.ist.psu.edu/590636.html

Paper: http://citeseer.ist.psu.edu/cache/papers/cs/31580/http:zSzzSzcarol.wins.uva.nlzSz~nicuzSzpublicationszSzIVC2003.pdf/evaluation-of-salient-point.pdf

 


14.

 

Novel Skeletal Representation For Articulated Creatures

Gabriel J. Brostow1 , Irfan Essa1 , Drew Steedly1 and Vivek Kwatra1

ECCV 04 6??-878

 

Volumetric structures are frequently used as shape descriptors for 3D data. The capture of such data is being facilitated by developments in multi-view video and range scanning, extending to subjects that are alive and moving. In this paper, we examine vision-based modeling and the related representation of moving articulated creatures using spines. We define a spine as a branching axial structure representing the shape and topology of a 3D objects limbs, and capturing the limbs correspondence and motion over time.
Our spine concept builds on skeletal representations often used to describe the internal structure of an articulated object and the significant protrusions. The algorithms for determining both 2D and 3D skeletons generally use an objective function tuned to balance stability against the responsiveness to detail. Our representation of a spine provides for enhancements over a 3D skeleton, afforded by temporal robustness and correspondence. We also introduce a probabilistic framework that is needed to compute the spine from a sequence of surface data.
We present a practical implementation that approximates the spines joint probability function to reconstruct spines for synthetic and real subjects that move.

 

Abstract

Paper

If you are interested in this topic, talk to MingChing Chang in our group at B&H317.

 


15.

 

Three-dimensional metamorphosis: a survey.

Francis Lazarus and Anne Verroust.

The Visual Computer, 14(8-9):373--389, 1998.

 

A metamorphosis or a (3D) morphing is the process of continuously transforming one object into another. 2D and 3D morphing are popular in computer animation, industrial design, and growth simulation. Since there is no intrinsic solution to the morphing problem, user interaction can be a key component of a morphing software. Many morphing techniques have been proposed in recent years for 2D and 3D objects. We present a survey of the various 3D approaches, giving special attention to the user interface. We show how the approaches are intimately related to the object representations. We conclude by sketching some morphing strategies for the future.

 

http://citeseer.ist.psu.edu/context/935062/0

Abstract

Paper

 


16. Edge Detection

 

Are Iterations and Curvature Useful for Tensor Voting

Sylvain Fischer, Pierre Bayerl, Heiko Neumann, Gabriel Cristobal, Rafael Redondo

ECCV (3) 2004: 158-169

 

Tensor voting is an efficient algorithm for perceptual grouping and feature extraction, particularly for contour extraction. In this paper two studies on tensor voting are presented. First the use of iterations is investigated, and second, a new method for integrating curvature information is evaluated. In opposition to other grouping methods, tensor voting claims the advantage to be non-iterative. Although non-iterative tensor voting methods provide good results in many cases, the algorithm can be iterated to deal with more complex data configurations. The experiments conducted demonstrate that iterations substantially improve the process of feature extraction and help to overcome limitations of the original algorithm. As a further contribution we propose a curvature improvement for tensor voting. On the contrary to the curvature-augmented tensor voting proposed by Tang and Medioni, our method takes advantage of the curvature calculation already performed by the classical tensor voting and evaluates the full curvature, sign and amplitude. Some new curvature-modified voting fields are also proposed. Results show a lower degree of artifacts, smoother curves, a high tolerance to scale parameter changes and also more noise-robustness.
 

Abstract

Paper

 

If you are interested in this topic, talk to Amir Tamrakar in our group at B&H317.

 


17.


Shape Matching and Recognition - Using Generative Models and Informative Features.

Zhuowen Tu, Alan L. Yuille

ECCV04 III 195-209

 

We present an algorithm for shape matching and recognition based on a generative model for how one shape can be generated by the other. This generative model allows for a class of transformations, such as affine and non-rigid transformations, and induces a similarity measure between shapes. The matching process is formulated in the EM algorithm. To have a fast algorithm and avoid local minima, we show how the EM algorithm can be approximated by using informative features, which have two key properties–invariant and representative. They are also similar to the proposal probabilities used in DDMCMC [13]. The formulation allows us to know when and why approximations can be made and justifies the use of bottom-up features, which are used in a wide range of vision problems. This integrates generative models and feature-based approaches within the EM framework and helps clarifying the relationships between different algorithms for this problem such as shape contexts [3] and softassign [5]. We test the algorithm on a variety of data sets including MPEG7 CE-Shape-1, Kimia silhouettes, and real images of street scenes. We demonstrate very effective performance and compare our results with existing algorithms. Finally, we briefly illustrate how our approach can be generalized to a wider range of problems including object detection.

 

Abstract

Paper

 


18.


Recognizing Objects in Range Data Using Regional Point Descriptors.

Andrea Frome, Daniel Huber, Ravi Kolluri, Thomas Bülow, Jitendra Malik

ECCV04 III 224-237

 

Recognition of three dimensional (3D) objects in noisy and cluttered scenes is a challenging problem in 3D computer vision. One approach that has been successful in past research is the regional shape descriptor. In this paper, we introduce two new regional shape descriptors: 3D shape contexts and harmonic shape contexts. We evaluate the performance of these descriptors on the task of recognizing vehicles in range scans of scenes using a database of 56 cars. We compare the two novel descriptors to an existing descriptor, the spin image, showing that the shape context based descriptors have a higher recognition rate on noisy scenes and that 3D shape contexts outperform the others on cluttered scenes.

 

Abstract

 Paper

If you are interested in this topic, talk to MingChing Chang in our group at B&H317.

 


19.

 

Shape Reconstruction from 3D and 2D Data Using PDE-Based Deformable Surfaces,
Ye Duan, Liu Yang, Hong Qin, Dimitris Samaras

ECCV 2004, pp III:238-251

 

In this paper, we propose a new PDE-based methodology for deformable surfaces that is capable of automatically evolving its shape to capture the geometric boundary of the data and simultaneously discover its underlying topological structure. Our model can handle multiple types of data (such as volumetric data, 3D point clouds and 2D image data), using a common mathematical framework. The deformation behavior of the model is governed by partial differential equations (e.g. the weighted minimal surface flow). Unlike the level-set approach, our model always has an explicit representation of geometry and topology. The regularity of the model and the stability of the numerical integration process are ensured by a powerful Laplacian tangential smoothing operator. By allowing local adaptive refinement of the mesh, the model can accurately represent sharp features. We have applied our model for shape reconstruction from volumetric data, unorganized 3D point clouds and multiple view images. The versatility and robustness of our model allow its application to the challenging problem of multiple view reconstruction. Our approach is unique in its combination of simultaneous use of a high number of arbitrary camera views with an explicit mesh that is intuitive and easy-to-interact-with. Our model-based approach automatically selects the best views for reconstruction, allows for visibility checking and progressive refinement of the model as more images become available. The results of our extensive experiments on synthetic and real data demonstrate robustness, high reconstruction accuracy and visual quality.

 

Abstract

Paper

Paper II

If you are interested in this topic, talk to MingChing Chang in our group at B&H317.

 


20.


Color Constancy Using Local Color Shifts.  ECCV04 III 276-287

Marc Ebner

 

The human visual system is able to correctly determine the color of objects in view irrespective of the illuminant. This ability to compute color constant descriptors is known as color constancy. We have developed a parallel algorithm for color constancy. This algorithm is based on the computation of local space average color using a grid of processing elements. We have one processing element per image pixel. Each processing element has access to the data stored in neighboring elements. Local space average color is used to shift the color of the input pixel in the direction of the gray vector. The computations are executed inside the unit color cube. The color of the input pixel as well as local space average color is simply a vector inside this Euclidean space. We compute the component of local space average color which is orthogonal to the gray vector. This component is subtracted from the color of the input pixel to compute a color corrected image. Before performing the color correction step we can also normalize both colors. In this case, the resulting color is rescaled to the original intensity of the input color such that the image brightness remains unchanged.

 

Abstract

Paper

 


21.

 

A Correlation-Based Approach to Robust Point Set Registration, European Conference on Computer Vision

Yanghai Tsin and Takeo Kanade

ECCV '04 558 - 569

 

Correlation is a very effective way to align intensity images. We extend the correlation technique to point set registration using a method we call kernel correlation. Kernel correlation is an affinity measure, and it is also a function of the point set entropy. We define the point set registration problem as finding the maximum kernel correlation configuration of the the two point sets to be registered. The new registration method has intuitive interpretations, simple to implement algorithm and easy to prove convergence property. Our method shows favorable performance when compared with the iterative closest point (ICP) and EM-ICP methods.

 

Abstract

Paper

Paper II

If you are interested in this topic, talk to MingChing Chang in our group at B&H317.

 


22.


Hierarchical Organization of Shapes for Efficient Retrieval

Shantanu Joshi, Anuj Srivastava, Washington Mio, Xiuwen Liu

ECCV04 III 570-581

 

This paper presents a geometric approach to perform: (i) hierarchical clustering of imaged objects according to the shapes of their boundaries, and (ii) testing of observed shapes for classification. An intrinsic metric on nonlinear, infinite-dimensional shape space, obtained using geodesic lengths, is used for clustering. This analysis is landmark free, does not require embedding shapes in , and uses ordinary differential equations for flows (as opposed to partial differential equations). Intrinsic analysis also leads to well defined shape statistics such as means and covariances, and is computationally efficient. Clustering is performed in a hierarchical fashion. At any level of hierarchy clusters are generated using a minimum dispersion criterion and an MCMC-type search algorithm. Cluster means become elements to be clustered at the next level. Gaussian models on tangent spaces are used to pose binary or multiple hypothesis tests for classifying observed shapes. Hierarchical clustering and shape testing combine to form an efficient tool for shape retrieval from a large database of shapes. For databases with n shapes, the searches are performed using log(n) tests on average. Examples are presented for demonstrating these tools using shapes from Kimia shape database and the Surrey fish database.

 

Abstract

Paper

 


23.

 

Intrinsic Images by Entropy Minimization
Graham D. Finlayson1 , Mark S. Drew2 and Cheng Lu2

ECCV04 III 582 - 595

 

A method was recently devised for the recovery of an invariant image from a 3-band colour image. The invariant image, originally 1D greyscale but here derived as a 2D chromaticity, is independent of lighting, and also has shading removed: it forms an intrinsic image that may be used as a guide in recovering colour images that are independent of illumination conditions. Invariance to illuminant colour and intensity means that such images are free of shadows, as well, to a good degree. The method devised finds an intrinsic reflectivity image based on assumptions of Lambertian reflectance, approximately Planckian lighting, and fairly narrowband camera sensors. Nevertheless, the method works well when these assumptions do not hold. A crucial piece of information is the angle for an ldquoinvariant directionrdquo in a log-chromaticity space. To date, we have gleaned this information via a preliminary calibration routine, using the camera involved to capture images of a colour target under different lights. In this paper, we show that we can in fact dispense with the calibration step, by recognizing a simple but important fact: the correct projection is that which minimizes entropy in the resulting invariant image. To show that this must be the case we first consider synthetic images, and then apply the method to real images. We show that not only does a correct shadow-free image emerge, but also that the angle found agrees with that recovered from a calibration. As a result, we can find shadow-free images for images with unknown camera, and the method is applied successfully to remove shadows from unsourced imagery.

 

Abstract

Paper

 

Cast Shadow Segmentation Using Invariant Color Features

E. Salvador, A. Cavallaro, and T. Ebrahimi, CVIU04

Paper

 

Shadow Removal from a Real Image Based on Shadow Density

M. Baba, M. Mukumoki, and N. Asada

Paper

 


24. Object Recognition


Learning and Bayesian Shape Extraction for Object Recognition

Washington Mio, Anuj Srivastava, Xiuwen Liu

ECCV04 IV 62-73

 

We present a novel algorithm for extracting shapes of contours of (possibly partially occluded) objects from noisy or low-contrast images. The approach taken is Bayesian: we adopt a region-based model that incorporates prior knowledge of specific shapes of interest. To quantify this prior knowledge, we address the problem of learning probability models for collections of observed shapes. Our method is based on the geometric representation and algorithmic analysis of planar shapes introduced and developed in [15]. In contrast with the commonly used approach to active contours using partial differential equation methods [12,20,1], we model the dynamics of contours on vector fields on shape manifolds.

 

Abstract

Paper

If you are interested in this topic, talk to Nhon Trinh in our group at B&H317.

 


25.

 

Multiphase Dynamic Labeling for Variational Recognition-Driven Image Segmentation 

Daniel Cremers, Nir Sochen, Christoph Schnörr

ECCV04 IV pp. 74 - 86
 

We propose a variational framework for the integration multiple competing shape priors into level set based segmentation schemes. By optimizing an appropriate cost functional with respect to both a level set function and a (vector-valued) labeling function, we jointly generate a segmentation (by the level set function) and a recognition-driven partition of the image domain (by the labeling function) which indicates where to enforce certain shape priors. Our framework fundamentally extends previous work on shape priors in level set segmentation by directly addressing the central question of where to apply which prior. It allows for the seamless integration of numerous shape priors such that – while segmenting both multiple known and unknown objects – the level set process may selectively use specific shape knowledge for simultaneously enhancing segmentation and recognizing shape.

 

Abstract
Paper

If you are interested in this topic, talk to Nhon Trinh in our group at B&H317.

 


26.

 

Detecting Keypoints with Stable Position, Orientation, and Scale under Illumination Changes 

Bill Triggs

ECCV04 IV pp. 100 - 113
 

Local feature approaches to vision geometry and object recognition are based on selecting and matching sparse sets of visually salient image points, known as lsquokeypointsrsquo or lsquopoints of interestrsquo. Their performance depends critically on the accuracy and reliability with which corresponding keypoints can be found in subsequent images. Among the many existing keypoint selection criteria, the popular Förstner-Harris approach explicitly targets geometric stability, defining keypoints to be points that have locally maximal self-matching precision under translational least squares template matching. However, many applications require stability in orientation and scale as well as in position. Detecting translational keypoints and verifying orientation/scale behaviour post hoc is suboptimal, and can be misleading when different motion variables interact. We give a more principled formulation, based on extending the Förstner-Harris approach to general motion models and robust template matching. We also incorporate a simple local appearance model to ensure good resistance to the most common illumination variations. We illustrate the resulting methods and quantify their performance on test images.


Abstract

Paper

 


27.

 

Seamless Image Stitching in the Gradient Domain
Anat Levin, Assaf Zomet, Shmuel Peleg, et al.

ECCV04 IV pp. 377 - 389

 

Image stitching is used to combine several individual images having some overlap into a composite image. The quality of image stitching is measured by the similarity of the stitched image to each of the input images, and by the visibility of the seam between the stitched images.
In order to define and get the best possible stitching, we introduce several formal cost functions for the evaluation of the quality of stitching. In these cost functions, the similarity to the input images and the visibility of the seam are defined in the gradient domain, minimizing the disturbing edges along the seam. A good image stitching will optimize these cost functions, overcoming both photometric inconsistencies and geometric misalignments between the stitched images.
This approach is demonstrated in the generation of panoramic images and in object blending. Comparisons with existing methods show the benefits of optimizing the measures in the gradient domain.

 

Abstract

Paper

 


28.

 

Reliable Fiducial Detection in Natural Scenes

David Claus and Andrew W. Fitzgibbon

ECCV04 IV pp. 469 - 480
 

Reliable detection of fiducial targets in real-world images is addressed in this paper. We show that even the best existing schemes are fragile when exposed to other than laboratory imaging conditions, and introduce an approach which delivers significant improvements in reliability at moderate computational cost. The key to these improvements is in the use of machine learning techniques, which have recently shown impressive results for the general object detection problem, for example in face detection. Although fiducial detection is an apparently simple special case, this paper shows why robustness to lighting, scale and foreshortening can be addressed within the machine learning framework with greater reliability than previous, more ad-hoc, fiducial detection schemes.


Abstract

Paper

 


29.

 

Classification of Image Edges

Hanna Chidiac and Djemel Ziou Vision interface 99

 

Edges are relevant information for image representation. In this paper, we propose an algorithm for the classification of step, concave slope, convex slope, roof, valley and staircase edges. The importance of the classification is that it simplifies several problems in artificial vision and image processing, by associating specific processing rules to each type of edge. Our classification is based on the behavioral study of these edges with respect to differentiation operators and scale. The first directional derivative, the gradient and the Laplacian are used as operators. We test our algorithm on synthetic and real grey-level images. In most cases, the classification obtained corresponds to the intensity profile of the image.
 

Abstract

Paper

If you are interested in this topic, talk to Amir Tamrakar in our group at B&H317.

 


THE OLD BUT STILL NICE PROJECTS FROM LAST YEAR


 

 

1. Perceptual Grouping

 

Williams Grouping of Edges IJCV 2000

http://www.cs.unm.edu/~williams/williams-ijcv99.pdf

 

We propose a new measure of perceptual saliency and quantitatively compare its ability to detect natural shapes in cluttered backgrounds to five previously proposed measures. As defined in the new measure, the saliency of an edge is the fraction of closed random walks which contain that edge. The transition-probability matrix defining the random walk between edges is based on a distribution of natural shapes modeled by a stochastic motion. Each of the saliency measures in our comparison is a function of a set of affinity values assigned to pairs of edges. Although the authors of each measure define the affinity between a pair of edges somewhat differently, all incorporate the Gestalt principles of good-continuation and proximity in some form. In order to make the comparison meaningful, we use a single definition of affinity and focus instead on the performance of the different functions for combining affinity values. The primary performance criterion is accuracy. We compute false-positive rates in classifying edges as signal or noise for a large set of test figures. In almost every case, the new measure significantly outperforms previous measures.

 


2. Deformable Shapes

 

Papers:

http://www.ai.mit.edu/people/pff/papers/shapes.pdf
http://www.ai.mit.edu/people/pff/papers/pff.pdf

We present a new method for detecting deformable shapes in images. The main di culty with deformable template models is the very large (or infinite) number of possible non-rigid transformations of the templates. This makes the problem of finding an optimal match of a deformable template to an image incredibly hard. Using a new representation for deformable shapes we show how to e ciently find a global optimal solution to the non-rigid matching problem. Our matching algorithm can minimize a large class of energy functions, making it applicable to a wide range of problems. We present experimental results of detecting shapes in medical and natural images. Because we don’t rely on local search techniques, our method is very robust, yielding good matches even in images with high clutter.

 

Code: /vision/projects/kimia/segmentation/Felzeszwalb

 

Apply it to spline applications on Cleary images (/vision/images/medical/Spline-Images/Cleary-Images).

 


3. Image Reconstruction


Elder's Image Reconstruction (Diffusion Method)

 

See http://www.lems.brown.edu/~tcl/en298_summary.html
which came after the Johannes project.

http://www.lems.brown.edu/vision/courses/computer-vision-1999/projects/image-edit-msj/project/recon.html

 


4. Object Class Recognition

 

Object Class Recognition by Unsupervised Scale-Invariant Learning

R. Fergus, P. Perona, A. Zisserman

http://csdl.computer.org/comp/proceedings/cvpr/2003/1900/02/190020264abs.htm

 

We present a method to learn and recognize object class models from unlabeled and unsegmented cluttered scenes in a scale invariant manner. Objects are modeled as flexible constellations of parts. A probabilistic representation is used for all aspects of the object: shape, appearance, occlusion and relative scale. An entropy-based feature detector is used to select regions and their scale within the image. In learning the parameters of the scale-invariant object model are estimated. This is done using expectation-maximization in a maximum-likelihood setting. In recognition, this model is used in a Bayesian manner to classify images. The flexible nature of the model is demonstrated by excellent results over a range of datasets including geometrically constrained classes (e.g. faces, cars) and flexible objects (such as animals).

 


5. Tracking Through Tree-Search

 

D. Freedman. Effective tracking through tree-search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):604-615, 2003.

Paper: pdf

(http://www.cs.rpi.edu/~freedd/publications.html)

 

A new contour tracking algorithm is presented. Tracking is posed as a matching problem between curves constructed out of edges in the image, and some shape space describing the class of objects of interest. The main contributions of the paper are to present an algorithm which solves this problem accurately and efficiently, in a provable manner. In particular, the algorithm’s efficiency derives from a novel tree-search algorithm through the shape space, which allows for much of the shape space to be explored with very little effort. This latter property makes the algorithm effective in highly cluttered scenes, as is demonstrated in an experimental comparison with a condensation tracker.

 


 

6. Shape from Shading Using The Level Set Approach

 

Ronnie Kimmel, Kaleem Siddiqi, Benjamin B. Kimia, and Alfred M. Bruckstein. Shape from shading : Level set propagation and viscosity solutions. IJCV, 16(2), October 1995.

 


 

7. Model-based Reconstruction from CT View Data

 

Despite the significant role of geometry in the image formation process and the need for its recovery, the traditional approaches to computerized
tomography construct intensity images from X-ray measurement without an explicit notion of geometry. Ray attenuations are represented as a
sinogram parameterized in the viewing angle an distance and reconstructed via a variety of methods. The most popular of these is filtered backprojection which, by an application of the central slice theorem, first filters the measured data for each viewing angle and then cumulatively
projects it back into the image. Since the ideal filter cannot be realized, finite energy approximations have been developed. However, this leads to a blurring of the image data. While working directly in the measurement space, whenever possible, would avoid this artifact, the ultimate solution is to introduce geometry directly into the estimation procedure.

A second aspect of the traditional approach which needs to be re-examined is the discretization of space into "voxels", for which reconstruction
algorithms report an average value. This averaging leads to blurring when multiple structures are sampled by a voxel, e.g., near a boundary,
the well-known partial volume effect. Observe that if the voxels were to be reshaped so that voxel boundaries would be coincident with anatomical
structure, such a blurring would not occur. However, such a reconfiguration of the voxels, requires a priori knowledge of the anatomy, a chicken-and-egg problem! We hypothesize that a simultaneous estimation of underlying geometry and intensity would substantially improve reconstruction results. Specifically, the use of geometric models in the reconstruction process avoids both of the above difficulties. First, the use of models generally reduces the number of parameters to be estimated, thus leveraging the information in the measurement ray. Second, the use of models prevents partial volume effects since "voxel" boundaries are matched with the anatomy.

The use of models, however, has two potential drawbacks. First, since the space of models is to be matched with the underlying normative anatomy, it
could be argued that the useful regularization derived in the use of models can potentially also miss estimating pathological anatomies. In our
experience with the use of models, deviations from models has typically lead to large errors which can highlight regions which a radiologist should
closely examine. Second, there is a fundamental combinatorial problem in the use of models: voxels are generically placed in a regular rectangular array for all types of images. However, the placement of more sophisticated geometric models faces combinatorial explosion. We plan to bootstrap the simultaneous estimation of combinations of models and their geometric arrangement by resorting to local estimates from the raw measurement data, which restricts the number of possible arrangements. In this regard, the selection of the lung as an initial study has several advantages. First, lung vessel anatomy is complicated only in the connectivity of various segments, while each vessel segment can be approximated by a rather simple cylindrical geometry. Second, we are able to work directly with raw (not reconstructed) data due to the formulation proposed here. Third, one of the background elements, air, stands in sharp contrast (in Hounsfeld units) to the remaining tissue. Fourth, the vessel tree and bronchial trees follow each other closely, thus providing a measure of anatomical validity.


X. Battle, G. Cunningham, and K. Hanson.
Tomographic reconstruction using 3{D} deformable models. Phys. Med. Biol., 43:983--990, 1998.

J. G. Brankov, Y. Yang, and M. N. Wernick.
Tomographic image reconstruction using content-adaptive mesh modeling. IEEE ICIP, pages 7--10, Oct. 2001.
 


8. Stereo

 

Multi-view Stereo Beyond Lambert

Hailin Jin, Stefano Soatto, Anthony J. Yezzi CVPR03

 

We consider the problem of estimating the shape and radiance of an object from a calibrated set of views under the assumption that the reflectance of the object is non-Lambertian. Unlike traditional stereo, we do not solve the correspondence problem by comparing image-to-image. Instead, we exploit a rank constraint on the radiance tensor field of the surface in space, and use it to define a discrepancy measure between each image and the underlying model. Our approach automatically returns an estimate of the radiance of the scene, along with its shape, represented by a dense surface. The former can be used to generate novel views that capture the non-Lambertian appearance of the scene.

 


9. Normalized Cuts and Image Segmentation

 

http://citeseer.nj.nec.com/shi97normalized.html

 

We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total...

Paper: http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf

 


10. Image Height Ridge Detection

 

D. Eberly, "Ridges in image and data analysis," in Computational Imaging and Vision. Dordrecht, The Netherlands: Kluwer Academic, 1996, vol. 7.

 

Combinatorial Classification of Pixels for Ridge Extraction in a Gray-scale Fingerprint Image

Paper: http://citeseer.nj.nec.com/553141.html

 

R. Haralick. Ridges and Valleys on Digital Images. Comput. Vis. Graph. Imag. Process., vol. 22, pages 28-38, 1983.

 


11. Shape-Based Compression

 

Progressive Content-Based Shape Compression for Retrieval of Binary Images
Corinne Le Buhan Jordan, Touradj Ebrahimi and Murat Kunt

Computer Vision and Image Understanding
Volume 71, Issue 2

 

This paper deals with content-based compression of binary-shape images. The proposed method is based on a polygonal approximation of the shape contours. A well-known approximation algorithm, from computer vision applications such as shape analysis and boundary pattern matching, is adapted to achieve a progressive representation. The resulting various levels of shape quality are encoded, from a coarse representation for fast browsing up to a lossless representation for final rendering. In order to perform efficient compression of the progressive shape information, discrete geometrical constraints inherent to the image grid quantization are exploited. While the proposed scheme offers a content-based description (shape boundary as opposed to bitmap blocks) together with a quality scalable representation, it remains comparable, in terms of compression efficiency, with state of the art shape coding methods that do not combine such functionalities.

 


12. Texture based segmentation

 

Segmentation of Textured Images

http://www-dbv.informatik.uni-bonn.de/image/segmentation.html

 

The unsupervised segmentation of textured images is a difficult and challenging low level vision problem with important applications in vision-guided autonomous robotics, product quality inspection, medical diagnosis and in the analysis of remotely sensed images. Algorithms for subsequent image processing stages like motion analysis and tracking, stereo vision, object recognition and scene interpretation often rely on a high quality image segmentation.

The segmentation problem can be informally described as the task of partitioning an image into homogeneous regions. For textured images one of the main conceptual difficulties is the definition of a homogeneity measure in mathematical terms.The segmentation problem can be informally described as the task of partitioning an image into homogeneous regions. For textured images one of the main conceptual difficulties is the definition of a homogeneity measure in mathematical terms. Our approach to unsupervised texture segmentation is based on four cascaded design decisions, concerning the questions of image representation, texture homogeneity, objective functions and optimization procedures.
 

 

Realistic Textures for Virtual Anastylosis
Alexey Zalesny, Dominik Auf der Maur, Rupert Paget, Maarten Vergauwen and Luc Van Gool

http://www.lems.brown.edu/vision/conferences/ACVA03/ACVA03.html

 

See some cool results here:

http://www.vision.ee.ethz.ch/~rpaget/texture.htm

 

 


13. Texture Synthesis

 

See http://www.vision.ee.ethz.ch/~zales/

 

Alexey Zalesny, Vittorio Ferrari, Geert Caenen, and Luc Van Gool, "Parallel Composite Texture Synthesis", Texture 2002 Workshop in conjunction with ECCV 2002, pp. 151-155.


Geert Caenen, Vittorio Ferrari, Alexey Zalesny, and Luc Van Gool, "Analyzing the layout of composite textures", Texture 2002 Workshop in conjunction with ECCV 2002, pp. 15-19.


Alexey Zalesny, Vittorio Ferrari, Geert Caenen, Dominik Auf der Maur, and Luc Van Gool, "Composite Texture Descriptions", ECCV 2002, Vol. 3, pp. 180-194.