EN161 Image Understanding Projects
|
All projects will entail careful reading and understanding 1-2 main papers and reading several other supplementary papers as the foundation to enable you to implement and test a current method in your chosen topic. You will be expected to be able to discuss the strengths and weaknesses of the method. |
Contact TA: MingChing Chang
Projects with ![]()
![]()
are highly recommended for this course.
If you have problem downloading the SpringerLink papers, you will probably see this warning message:
You are logged in as a 'Multiple User' of
'Brown University'.
Your institution's MetaPress ID is
It's just a warning message. Follow these steps to download it:
1. Make sure you connect to it using Brown's network.
2. goto: http://springerlink.metapress.com/
3. Search the paper in this page and you should be able to download it.
1. Region-Based Tracking
Spectral Solution of Large-Scale Extrinsic Camera Calibration as a Graph Embedding Problem
Matthew Brand, Antone, M.and Teller, S.
ECCV 04 II 262-273
Extrinsic calibration of large-scale ad hoc networks of cameras is posed as the following problem: Calculate the locations of N mobile, rotationally aligned cameras distributed over an urban region, subsets of which view some common environmental features. We show that this leads to a novel class of graph embedding problems that admit closed-form solutions in linear time via partial spectral decomposition of a quadratic form. The minimum squared error (MSE)solution determines locations of cameras and/or features in any number of dimensions. The spectrum also indicates insufficiently constrained problems, which can be decomposed into well-contrained rigid subproblems and analyzed to determine useful new views for missing constraints. We deomonstrate the method with large networks of mobile cameras distributed over an urban environment, using directional constraints that have been extracted automatically from commonly viewed features. Spectral solutions yield layouts that are consistent in some cases to a fraction of a millimeter, substantially improving the state of the art. Global laybout of large camera networks can be computed in a fraction of a second.
2. Inpainting
A Combined PDE and Texture Synthesis Approach to Inpainting
Herald Grossauer
ECCV04 II 214-224
While there is a vast amount of literature considering PDE based inpainting and inpainting by texture synthesis, only a few publications are concerned with combination of both approaches. We present a novel algorithm which combines both approaches and treats each distinct region of the image separately. Thus we are naturally lead to include a segmentation pass as a new feature. This way the correct choice of texture samples for the texture synthesis is ensured. We propose a novel concept of local texture synthesis which gives satisfactory results even for large domains in a complex environment.
3.
Weighted Minimal Hypersurfaces and Their Applications in Computer Vision
Bastain Goldlucke and Marcus Magnor ECCV04 II 366-378
Many interesting problems in computer vision can be formulated as a minimization problem for an energy functional. If this functional is given as an integral of a scalar-valued weight function over an unknown hypersurface, then the minimal surface we are looking for can be determined as a solution of the functionals Euler-Lagrange equation. This paper deals with a general class of weight functions that may depend on the surface point and normal. By making use of a mathematical tool called the method of the moving frame, we are able to derive the Euler-Lagrange equation in arbitrary-dimensional space and without the need for any surface parameterization. Our work generalizes existing proofs, and we demonstrate that it yields the correct evolution equations for a variety of previous computer vision techniques which can be expressed in terms of our theoretical framework. In addition, problems involving minimal hypersurfaces in dimensions higher than three, which were previously impossible to solve in practice, can now be introduced and handled by generalized versions of existing algorithms. As one example, we sketch a novel idea how to reconstruct temporally coherent geometry from multiple video streams.
4.
![]()
![]()
![]()
Texture Boundary Detection for Real-Time Tracking
Ali Shahrokni et al ECCV04 II 566-577
Most of the tracking
techniques used to determine the pose of an object in a sequence rely on the
fact that silhouettes can be extracted using relatively simple algorithms such
as background subtraction or standard edge- and gradient-based techniques.
However, in practice, this rarely is the case and these silhouette extraction
methods can be very brittle. They tend to fail in the presence of highly
textured objects and clutter, which produce too many irrelevant edges. In such
situations, it is advantageous to detect texture boundaries instead. However,
because texture segmentation techniques usually require computing statistics
over image patches, they are more useful for detection in a single image than
for tracking.
Alternatively, we can use all the assumptions that are applicable to our
tracking problem to simplify the problem a bit. More precisely we can start from
the estimated projection of a 3-D object model and performs a line search in the
direction perpendicular to the projected edges. This allow us to compute the
most probable location of a texture boundary on the search line to which we
refer to as scanline. The main idea behind scanline texture boundary detection
is illustrated in Figure1 where we which to find the point on the yellow lines
for which the probability of texture crossing is maximum. This is expressed in
terms of the product of the conditional probabilities of pixel sequences on both
side of a given point along the scanline given an estimate of the texture model
at both sides. This estimate can be updated as we are going through the scanline.
This concept is formalized in detail in the subsequent sections and is based on
the paper by Shahrokni et. al.[1].
5.
A TV Flow Based Local Scale Measure for Texture Discrimination
Thomas Brox and Joachim Weickert ECCV04 II 578-590
We introduce a technique
for measuring local scale, based on a special property of the so-called total
variational (TV) flow. For TV flow, pixels change their value with a speed that
is inversely proportional to the size of the region they belong to. Exploiting
this property directly leads to a region based measure for scale that is
well-suited for texture discrimination. Together with the image intensity and
texture features computed from the second moment matrix, which measures the
orientation of a texture, a sparse feature space of dimension 5 is obtained that
covers the most important descriptors of a texture: magnitude, orientation, and
scale. A demonstration of the performance of these features is given in the
scope of texture segmentation.
Our research is partly funded by the project WE 2602/1-1 of the Deutsche
Forschungsgemeinschaft (DFG). This is gratefully acknowledged. We also want to
thank Mikaël Rousson and Rachid Deriche for many interesting discussions on
texture segmentation.
6.
![]()
![]()
Interactive Image Segmentation Using an Adaptive GMMRF Model
A. Blake et al ECCV04 I 428-441
The problem of
interactive foreground/background segmentation in still images is of great
practical importance in image editing. The state of the art in interactive
segmentation is probably represented by the graph cut algorithm of Boykov and
Jolly (ICCV 2001). Its underlying model uses both colour and contrast
information, together with a strong prior for region coherence. Estimation is
performed by solving a graph cut problem for which very efficient algorithms
have recently been developed. However the model depends on parameters which must
be set by hand and the aim of this work is for those constants to be learned
from image data.
First, a generative, probabilistic formulation of the model is set out in terms
of a Gaussian Mixture Markov Random Field (GMMRF). Secondly, a pseudolikelihood
algorithm is derived which jointly learns the colour mixture and coherence
parameters for foreground and background respectively. Error rates for GMMRF
segmentation are calculated throughout using a new image database, available on
the web, with ground truth provided by a human segmenter. The graph cut
algorithm, using the learned parameters, generates good object-segmentations
with little interaction. However, pseudolikelihood learning proves to be frail,
which limits the complexity of usable models, and hence also the achievable
error rate.
Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D Images
Yuri Boykov and Marie-Pierre Jolly
In this paper we describe a new technique for general purpose interactive segmentation of N-dimensional images. The user marks certain pixels as ``object'' or ``background'' to provide hard constraints for segmentation. Additional soft constraints incorporate both boundary and region information. Graph cuts are used to find the globally optimal segmentation of the N-dimensional image. The obtained solution gives the best balance of boundary and region properties among all segmentations satisfying the constraints. The topology of our segmentation is unrestricted and both ``object'' and ``background'' segments may consist of several isolated parts. Some experimental results are presented in the context of photo/video editing and medical image segmentation. We also demonstrate an interesting Gestalt example. A fast implementation of our segmentation method is possible via a new max-flow algorithm in PAMI'04.
7.
![]()
![]()
![]()
![]()
Region-Based Segmentation on Evolving Surfaces with Application to 3D Reconstruction of Shape and Piecewise Constant Radiance
Hailin Jin, Anthony J. Yezzi, Stefano Soatto
ECCV04 114-125
We consider the problem of estimating the shape and radiance of a scene from a
calibrated set of images under the assumption that the scene is Lambertian and
its radiance is piecewise constant. We model the radiance segmentation
explicitly using smooth curves on the surface that bound regions of constant
radiance. We pose the scene reconstruction problem in a variational framework,
where the unknowns are the surface, the radiance values and the segmenting
curves. We propose an iterative procedure to minimize a global cost functional
that combines geometric priors on both the surface and the curves with a data
fitness score. We carry out the numerical implementation in the level set
framework.
Keywords: variational methods, Mumford-Shah functional, image segmentation,
multi-view stereo, level set methods, curve evolution on manifolds.
http://www.vision.cs.ucla.edu/projects.html
Semi-supervised Statistical Region Refinement for Color Image Segmentation
Richard Nock and Frank Nielsen
Pattern Recognition, Elsevier Science, accepted, 2005
Segmentation Given Partial Grouping Constraints
Stella X. Yu, Jianbo Shi PAMI Feb 2004 173-183
We consider data clustering problems where partial grouping is known a priori. We formulate such biased grouping problems as a constrained optimization problem, where structural properties of the data define the goodness of a grouping and partial grouping cues define the feasibility of a grouping. We enforce grouping smoothness and fairness on labeled data points so that sparse partial grouping information can be effectively propagated to the unlabeled data. Considering the normalized cuts criterion in particular, our formulation leads to a constrained eigenvalue problem. By generalizing the Rayleigh-Ritz theorem to projected matrices, we find the global optimum in the relaxed continuous domain by eigendecomposition, from which a near-global optimum to the discrete labeling problem can be obtained effectively. We apply our method to real image segmentation problems, where partial grouping priors can often be derived based on a crude spatial attentional map that binds places with common salient features or focuses on expected object locations. We demonstrate not only that it is possible to integrate both image structures and priors in a single grouping process, but also that objects can be segregated from the background without specific object knowledge.
Color Texture Segmentation by Region-Boundary Cooperation
Jordi Freixenet, Xavier Muñoz, Joan Martí, Xavier Lladó
ECCV04 II 250-261
A colour texture segmentation method which unifies region and boundary information is presented in this paper. The fusion of several approaches which integrate both information sources allows us to exploit the benefits of each one. We propose a segmentation method which uses a coarse detection of the perceptual (colour and texture) edges of the image to adequately place and initialise a set of active regions. Colour texture of regions is modelled by the conjunction of non-parametric techniques of kernel density estimation, which allow to estimate the colour behaviour, and classical co-occurrence matrix based texture features. When the region information is defined, accurate boundary information can be extracted. Afterwards, regions concurrently compete for the image pixels in order to segment the whole image taking both information sources into account. In contrast with other approaches, our method achieves relevant results on images with regions with the same texture and different colour (as well as with regions with the same colour and different texture), demonstrating the performance of our proposal. Furthermore, the method has been quantitatively evaluated and compared on a set of mosaic images, and results on real images are shown and analysed.
Geodesic Active Regions and Level Set Methods for Supervised Texture Segmentation
Nikos Paragios, Rachid Deriche IJCV02 223-247
This paper presents a novel variational framework to deal with frame partition problems in Computer Vision. This framework exploits boundary and region-based segmentation modules under a curve-based optimization objective function. The task of supervised texture segmentation is considered to demonstrate the potentials of the proposed framework. The textured feature space is generated by filtering the given textured images using isotropic and anisotropic filters, and analyzing their responses as multi-component conditional probability density functions. The texture segmentation is obtained by unifying region and boundary-based information as an improved Geodesic Active Contour Model. The defined objective function is minimized using a gradient-descent method where a level set approach is used to implement the obtained PDE. According to this PDE, the curve propagation towards the final solution is guided by boundary and region-based segmentation forces, and is constrained by a regularity force. The level set implementation is performed using a fast front propagation algorithm where topological changes are naturally handled. The performance of our method is demonstrated on a variety of synthetic and real textured frames.
8. Recognition
![]()
![]()
Learning Chance
Probability Functions for Shape Retrieval or Classification
Boaz J. Super CVPR04
Several example-based systems for shape retrieval and shape classification directly match input shapes to stored shapes, without using class membership information to perform the matching. We propose a method for improving the accuracy of this type of system. First, the system learns a set of chance probability functions (CPFs). The CPFs estimate the probabilities of obtaining a query shape with particular distances from each training example by chance. The learned CPFs are used at runtime to rapidly estimate the chance probabilities of the observed distances between the actual query shape and the database shapes. These estimated probabilities are then used as a dissimilarity measure for shape retrieval and/or nearest-neighbor classification. The CPF learning method is parameter-free. Experimental evaluation demonstrates that: (1) chance probabilities yield higher accuracy than Euclidean distances; (2) the learned CPFs support fast matching; and (3) the CPF-based system outperforms prior systems on a standard benchmark test of retrieval accuracy.
![]()
![]()
BAS: a perceptual shape descriptor based on the beam angle statistics
Nafiz Arica, and Fatos T. Yarman Vural
Pattern Recognition Letters vol 24 Issue 9-10 1627-1639 June 2004
The proposed shape descriptor is based on the beams originated from a boundary point, which are defined as lines connecting that point with the rest of the points on the boundary. At each point, the angle between a pair of beams is calculated to extract the topological structure of the boundary. Then, a shape descriptor is defined by using the third-order statistics of all the beam angles in a set of neighborhood systems. It is shown that beam angle statistics (BAS) is invariant to translation, rotation, scale and is insensitive to distortions. Experiments are done on the dataset of MPEG 7 Core Experiments Shape-1. It is observed that BAS outperforms the MPEG 7 shape descriptors.
Paper (through Brown Library)
(compare to curve matching)
9.
Learning the parts of objects with nonnegative matrix factorization
D. D. Lee and H. S. Seung, Nature 401, 788 (1999).
Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.
Statistics of Nature Image Contours??
10. Curvature Estimation on Meshes/Point Clouds
Estimating Curvatures and Their Derivatives on Triangle Meshes
Szymon Rusinkiewicz 3DPVT04
The computation of curvature and other differential properties of surfaces is essential for many techniques in analysis and rendering. We present a finite-differences approach for estimating curvatures on irregular triangle meshes that may be thought of as an extension of a common method for estimating per-vertex normals. The technique is efficient in space and time, and results in significantly fewer outlier estimates while more broadly offering accuracy comparable to existing methods. It generalizes naturally to computing derivatives of curvature and higher-order surface differentials.
11. Object Recognition based on Local Invariant Features
An Affine Invariant
Interest Point Detector
K. Mikolajczyk and C. Schmid ECCV02 128-142
This paper presents a
novel approach for detecting affine invariant interest points. Our method can
deal with significant affine transformation including large scale changes. Such
transformations introduce significant changes in the point location as well as
in the scale and the shape of the neighbourhood of an interest point. Our
approach allows to solve for these problem simultaneously. It is based on three
key ideas: 1) The second moment matrix computed in a point can be used to
normalize a region in an affine invariant way (skew and stretch). 2) The scale
of the local structure is indicated by local extrema of normalized derivative
over scale. 3) An affine-adapted Harris detector determines the location of
interest points. A multi-scale version of this detector is used for
initialization. An iterative algorithm then modifies location, scale and
neighbourhood of each point and converges to a affine invariant points. For
matching and recognition, the image is characterized by a set of a affine
invariant points; the affine transformation associated with each point allows
the computation of an affine invariant descriptor which is also invariant to
affine illumination changes. A quantitative comparison of our detector with
existing ones shows a significant improvement in the presence of large affine
deformations. Experimental results for wide baseline matching show an excellent
performance in the presence of large perspective transformations including
significant scale changes. Results for recognition are very good for a database
with more than 5000 images.
Keywords: Image features,matching,recognition.
3D Object Modeling and
Recognition Using Affine-Invariant Patches and Multi-View Spatial Constraints.
F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce.
CVPR 2003, II 272-277
This paper presents a representation for three-dimensional objects in terms of affine-invariant image patches and their spatial relationships. Multi-view constraints associated with groups of patches are combined with a normalized representation of their appearance to guide matching and reconstruction, allowing the acquisition of true three-dimensional affine and Euclidean models from multiple images and their recognition in a single photograph taken from an arbitrary viewpoint. The proposed approach does not require a separate segmentation stage and is applicable to cluttered scenes. Preliminary modeling and recognition results are presented.
![]()
![]()
Distinctive image features from scale invariant keypoints
D. Lowe, IJCV 2(60):91-110, 2004
This paper presents a method for extracting distinctive invariant features from images, which can be used to perform reliable matching between different images of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a a substantial range of affine distortion, addition of noise, change in 3D viewpoint, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Simultaneous Object Recognition and Segmentation by Image Exploration
Vittorio Ferrari, Tinne Tuytelaars, Luc Van Gool, ECCV04 I 40-54
Methods based on local, viewpoint invariant features have proven capable of recognizing objects in spite of viewpoint changes, occlusion and clutter. However, these approaches fail when these factors are too strong, due to the limited repeatability and discriminative power of the features. As additional shortcomings, the objects need to be rigid and only their approximate location is found. We present a novel Object Recognition approach which overcomes these limitations. An initial set of feature correspondences is first generated. The method anchors on it and then gradually explores the surrounding area, trying to construct more and more matching features, increasingly farther from the initial ones. The resulting process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. Only very few correct initial matches suffice for reliable recognition. The experimental results demonstrate the stronger power of the presented method in dealing with extensive clutter, dominant occlusion, large scale and viewpoint changes. Moreover non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. The approach can extend any viewpoint invariant feature extractor.
Wide baseline stereo based on local, affinely invariant regions
T. Tuytelaars, L. Van Gool, British Machine Vision Conf. 2000, pp. 412-422
http://citeseer.ist.psu.edu/context/1766120/0
****
An Affine Invariant Salient Region Detector
Timor Kadir, Andrew Zisserman, Michael Brady
ECCV (1) 2004: 228-241
In this paper we
describe a novel technique for detecting salient regions in an image. The
detector is a generalization to affine invariance of the method introduced by
Kadir and Brady [10]. The detector deems a region salient if it exhibits
unpredictability in both its attributes and its spatial scale.
The detector has significantly different properties to operators based on kernel
convolution, and we examine three aspects of its behaviour: invariance to
viewpoint change; insensitivity to image perturbations; and repeatability under
intra-class variation. Previous work has, on the whole, concentrated on
viewpoint invariance. A second contribution of this paper is to propose a
performance test for evaluating the two other aspects.
We compare the performance of the saliency detector to other standard detectors
including an affine invariance interest point detector. It is demonstrated that
the saliency detector has comparable viewpoint invariance performance, but
superior insensitivity to perturbations and intra-class variation performance
for images of certain object classes.
12.
Evaluation of Interest Point Detectors
Cordelia Schmid and Roger Mohr and Christian Bauckhage IJCV, 2000
Many different low-level feature detectors exist and it is widely agreed that the evaluation of detectors is important. In this paper we introduce two evaluation criteria for interest points repeatability rate and information content. Repeatability rate evaluates the geometric stability under different transformations. Information content measures the distinctiveness of features. Different interest point detectors are compared using these two criteria. We determine which detector gives the best results and show that it satisfies the criteria well.
13.
Evaluation of Salient
Point Techniques
N. Sebe Q. Tian E. Loupias M.S. Lew T.S 2002
http://citeseer.ist.psu.edu/590636.html
14.
Novel Skeletal Representation For Articulated Creatures
Gabriel J. Brostow1 , Irfan Essa1 , Drew Steedly1 and Vivek Kwatra1
ECCV 04 6??-878
Volumetric structures
are frequently used as shape descriptors for 3D data. The capture of such data
is being facilitated by developments in multi-view video and range scanning,
extending to subjects that are alive and moving. In this paper, we examine
vision-based modeling and the related representation of moving articulated
creatures using spines. We define a spine as a branching axial structure
representing the shape and topology of a 3D objects limbs, and capturing the
limbs correspondence and motion over time.
Our spine concept builds on skeletal representations often used to describe the
internal structure of an articulated object and the significant protrusions. The
algorithms for determining both 2D and 3D skeletons generally use an objective
function tuned to balance stability against the responsiveness to detail. Our
representation of a spine provides for enhancements over a 3D skeleton, afforded
by temporal robustness and correspondence. We also introduce a probabilistic
framework that is needed to compute the spine from a sequence of surface data.
We present a practical implementation that approximates the spines joint
probability function to reconstruct spines for synthetic and real subjects that
move.
If you are interested in this topic, talk to MingChing Chang in our group at B&H317.
15.
Three-dimensional metamorphosis: a survey.
Francis Lazarus and Anne Verroust.
The Visual Computer, 14(8-9):373--389, 1998.
A metamorphosis or a (3D) morphing is the process of continuously transforming one object into another. 2D and 3D morphing are popular in computer animation, industrial design, and growth simulation. Since there is no intrinsic solution to the morphing problem, user interaction can be a key component of a morphing software. Many morphing techniques have been proposed in recent years for 2D and 3D objects. We present a survey of the various 3D approaches, giving special attention to the user interface. We show how the approaches are intimately related to the object representations. We conclude by sketching some morphing strategies for the future.
http://citeseer.ist.psu.edu/context/935062/0
16. Edge Detection
![]()
![]()
Are Iterations and Curvature Useful for Tensor Voting
Sylvain Fischer, Pierre Bayerl, Heiko Neumann, Gabriel Cristobal, Rafael Redondo
ECCV (3) 2004: 158-169
Tensor voting is an
efficient algorithm for perceptual grouping and feature extraction, particularly
for contour extraction. In this paper two studies on tensor voting are
presented. First the use of iterations is investigated, and second, a new method
for integrating curvature information is evaluated. In opposition to other
grouping methods, tensor voting claims the advantage to be non-iterative.
Although non-iterative tensor voting methods provide good results in many cases,
the algorithm can be iterated to deal with more complex data configurations. The
experiments conducted demonstrate that iterations substantially improve the
process of feature extraction and help to overcome limitations of the original
algorithm. As a further contribution we propose a curvature improvement for
tensor voting. On the contrary to the curvature-augmented tensor voting proposed
by Tang and Medioni, our method takes advantage of the curvature calculation
already performed by the classical tensor voting and evaluates the full
curvature, sign and amplitude. Some new curvature-modified voting fields are
also proposed. Results show a lower degree of artifacts, smoother curves, a high
tolerance to scale parameter changes and also more noise-robustness.
If you are interested in this topic, talk to Amir Tamrakar in our group at B&H317.
17.
Shape Matching and Recognition - Using Generative Models and Informative
Features.
Zhuowen Tu, Alan L. Yuille
ECCV04 III 195-209
We present an algorithm for shape matching and recognition based on a generative model for how one shape can be generated by the other. This generative model allows for a class of transformations, such as affine and non-rigid transformations, and induces a similarity measure between shapes. The matching process is formulated in the EM algorithm. To have a fast algorithm and avoid local minima, we show how the EM algorithm can be approximated by using informative features, which have two key properties–invariant and representative. They are also similar to the proposal probabilities used in DDMCMC [13]. The formulation allows us to know when and why approximations can be made and justifies the use of bottom-up features, which are used in a wide range of vision problems. This integrates generative models and feature-based approaches within the EM framework and helps clarifying the relationships between different algorithms for this problem such as shape contexts [3] and softassign [5]. We test the algorithm on a variety of data sets including MPEG7 CE-Shape-1, Kimia silhouettes, and real images of street scenes. We demonstrate very effective performance and compare our results with existing algorithms. Finally, we briefly illustrate how our approach can be generalized to a wider range of problems including object detection.
18.
Recognizing Objects in Range Data Using Regional Point Descriptors.
Andrea Frome, Daniel Huber, Ravi Kolluri, Thomas Bülow, Jitendra Malik
ECCV04 III 224-237
If you are interested in this topic, talk to MingChing Chang in our group at B&H317.
19.
Shape Reconstruction
from 3D and 2D Data Using PDE-Based Deformable Surfaces,
Ye Duan, Liu Yang, Hong Qin, Dimitris Samaras
ECCV 2004, pp III:238-251
In this paper, we propose a new PDE-based methodology for deformable surfaces that is capable of automatically evolving its shape to capture the geometric boundary of the data and simultaneously discover its underlying topological structure. Our model can handle multiple types of data (such as volumetric data, 3D point clouds and 2D image data), using a common mathematical framework. The deformation behavior of the model is governed by partial differential equations (e.g. the weighted minimal surface flow). Unlike the level-set approach, our model always has an explicit representation of geometry and topology. The regularity of the model and the stability of the numerical integration process are ensured by a powerful Laplacian tangential smoothing operator. By allowing local adaptive refinement of the mesh, the model can accurately represent sharp features. We have applied our model for shape reconstruction from volumetric data, unorganized 3D point clouds and multiple view images. The versatility and robustness of our model allow its application to the challenging problem of multiple view reconstruction. Our approach is unique in its combination of simultaneous use of a high number of arbitrary camera views with an explicit mesh that is intuitive and easy-to-interact-with. Our model-based approach automatically selects the best views for reconstruction, allows for visibility checking and progressive refinement of the model as more images become available. The results of our extensive experiments on synthetic and real data demonstrate robustness, high reconstruction accuracy and visual quality.
If you are interested in this topic, talk to MingChing Chang in our group at B&H317.
20.
Color Constancy Using Local Color Shifts. ECCV04 III 276-287
Marc Ebner
21.
A Correlation-Based Approach to Robust Point Set Registration, European Conference on Computer Vision
Yanghai Tsin and Takeo Kanade
ECCV '04 558 - 569
If you are interested in this topic, talk to MingChing Chang in our group at B&H317.
22.
Hierarchical Organization of Shapes for Efficient Retrieval
Shantanu Joshi, Anuj Srivastava, Washington Mio, Xiuwen Liu
ECCV04 III 570-581
23.
![]()
![]()
![]()
![]()
![]()
Intrinsic Images by
Entropy Minimization
Graham D. Finlayson1 , Mark S. Drew2 and Cheng Lu2
ECCV04 III 582 - 595
invariant
direction
in a log-chromaticity space. To date, we have gleaned this information via a
preliminary calibration routine, using the camera involved to capture images
of a colour target under different lights. In this paper, we show that we can
in fact dispense with the calibration step, by recognizing a simple but
important fact: the correct projection is that which minimizes entropy
in the resulting invariant image. To show that this must be the case we first
consider synthetic images, and then apply the method to real images. We show
that not only does a correct shadow-free image emerge, but also that the angle
found agrees with that recovered from a calibration. As a result, we can find
shadow-free images for images with unknown camera, and the method is applied
successfully to remove shadows from unsourced imagery.
Cast Shadow Segmentation Using Invariant Color Features
E. Salvador, A. Cavallaro, and T. Ebrahimi, CVIU04
Shadow Removal from a Real Image Based on Shadow Density
M. Baba, M. Mukumoki, and N. Asada
24. Object Recognition
Learning and Bayesian Shape Extraction for Object Recognition
Washington Mio, Anuj Srivastava, Xiuwen Liu
ECCV04 IV 62-73
If you are interested in this topic, talk to Nhon Trinh in our group at B&H317.
25.
![]()
![]()
![]()
Multiphase Dynamic Labeling for Variational Recognition-Driven Image Segmentation
Daniel Cremers, Nir Sochen, Christoph Schnörr
ECCV04 IV pp. 74 - 86
If you are interested in this topic, talk to Nhon Trinh in our group at B&H317.
26.
Detecting Keypoints with Stable Position, Orientation, and Scale under Illumination Changes
Bill Triggs
ECCV04 IV pp. 100 - 113
Local feature approaches
to vision geometry and object recognition are based on selecting and matching
sparse sets of visually salient image points, known as
keypoints
or
points
of interest
.
Their performance depends critically on the accuracy and reliability with which
corresponding keypoints can be found in subsequent images. Among the many
existing keypoint selection criteria, the popular Förstner-Harris approach
explicitly targets geometric stability, defining keypoints to be points that
have locally maximal self-matching precision under translational least squares
template matching. However, many applications require stability in orientation
and scale as well as in position. Detecting translational keypoints and
verifying orientation/scale behaviour post hoc is suboptimal, and can be
misleading when different motion variables interact. We give a more principled
formulation, based on extending the Förstner-Harris approach to general motion
models and robust template matching. We also incorporate a simple local
appearance model to ensure good resistance to the most common illumination
variations. We illustrate the resulting methods and quantify their performance
on test images.
27.
Seamless Image
Stitching in the Gradient Domain
Anat Levin, Assaf Zomet, Shmuel Peleg, et al.
ECCV04 IV pp. 377 - 389
28.
Reliable Fiducial Detection in Natural Scenes
David Claus and Andrew W. Fitzgibbon
ECCV04 IV pp. 469 - 480
Reliable detection of fiducial targets in real-world images is addressed in this paper. We show that even the best existing schemes are fragile when exposed to other than laboratory imaging conditions, and introduce an approach which delivers significant improvements in reliability at moderate computational cost. The key to these improvements is in the use of machine learning techniques, which have recently shown impressive results for the general object detection problem, for example in face detection. Although fiducial detection is an apparently simple special case, this paper shows why robustness to lighting, scale and foreshortening can be addressed within the machine learning framework with greater reliability than previous, more ad-hoc, fiducial detection schemes.
29.
Classification of Image Edges
Hanna Chidiac and Djemel Ziou Vision interface 99
Edges are relevant
information for image representation. In this paper, we propose an algorithm for
the classification of step, concave slope, convex slope, roof, valley and
staircase edges. The importance of the classification is that it simplifies
several problems in artificial vision and image processing, by associating
specific processing rules to each type of edge. Our classification is based on
the behavioral study of these edges with respect to differentiation operators
and scale. The first directional derivative, the gradient and the Laplacian are
used as operators. We test our algorithm on synthetic and real grey-level
images. In most cases, the classification obtained corresponds to the intensity
profile of the image.
If you are interested in this topic, talk to Amir Tamrakar in our group at B&H317.
THE OLD BUT STILL NICE PROJECTS FROM LAST YEAR
1. Perceptual Grouping
Williams Grouping of Edges IJCV 2000
http://www.cs.unm.edu/~williams/williams-ijcv99.pdf
We propose a new measure of perceptual saliency and quantitatively compare its ability to detect natural shapes in cluttered backgrounds to five previously proposed measures. As defined in the new measure, the saliency of an edge is the fraction of closed random walks which contain that edge. The transition-probability matrix defining the random walk between edges is based on a distribution of natural shapes modeled by a stochastic motion. Each of the saliency measures in our comparison is a function of a set of affinity values assigned to pairs of edges. Although the authors of each measure define the affinity between a pair of edges somewhat differently, all incorporate the Gestalt principles of good-continuation and proximity in some form. In order to make the comparison meaningful, we use a single definition of affinity and focus instead on the performance of the different functions for combining affinity values. The primary performance criterion is accuracy. We compute false-positive rates in classifying edges as signal or noise for a large set of test figures. In almost every case, the new measure significantly outperforms previous measures.
2. Deformable Shapes
Papers:
http://www.ai.mit.edu/people/pff/papers/shapes.pdf
http://www.ai.mit.edu/people/pff/papers/pff.pdf
We present a new method for detecting deformable shapes in images. The main di culty with deformable template models is the very large (or infinite) number of possible non-rigid transformations of the templates. This makes the problem of finding an optimal match of a deformable template to an image incredibly hard. Using a new representation for deformable shapes we show how to e ciently find a global optimal solution to the non-rigid matching problem. Our matching algorithm can minimize a large class of energy functions, making it applicable to a wide range of problems. We present experimental results of detecting shapes in medical and natural images. Because we don’t rely on local search techniques, our method is very robust, yielding good matches even in images with high clutter.
Code: /vision/projects/kimia/segmentation/Felzeszwalb
Apply it to spline applications on Cleary images (/vision/images/medical/Spline-Images/Cleary-Images).
3. Image Reconstruction
Elder's Image Reconstruction (Diffusion Method)
See
http://www.lems.brown.edu/~tcl/en298_summary.html
which came after the Johannes project.
4. Object Class Recognition
Object Class Recognition by Unsupervised Scale-Invariant Learning
R. Fergus, P. Perona, A. Zisserman
http://csdl.computer.org/comp/proceedings/cvpr/2003/1900/02/190020264abs.htm
We present a method to learn and recognize object class models from unlabeled and unsegmented cluttered scenes in a scale invariant manner. Objects are modeled as flexible constellations of parts. A probabilistic representation is used for all aspects of the object: shape, appearance, occlusion and relative scale. An entropy-based feature detector is used to select regions and their scale within the image. In learning the parameters of the scale-invariant object model are estimated. This is done using expectation-maximization in a maximum-likelihood setting. In recognition, this model is used in a Bayesian manner to classify images. The flexible nature of the model is demonstrated by excellent results over a range of datasets including geometrically constrained classes (e.g. faces, cars) and flexible objects (such as animals).
5. Tracking Through Tree-Search
![]()
![]()
D. Freedman. Effective
tracking through tree-search. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 25(5):604-615, 2003.
Paper: pdf
(http://www.cs.rpi.edu/~freedd/publications.html)
A new contour tracking algorithm is presented. Tracking is posed as a matching problem between curves constructed out of edges in the image, and some shape space describing the class of objects of interest. The main contributions of the paper are to present an algorithm which solves this problem accurately and efficiently, in a provable manner. In particular, the algorithm’s efficiency derives from a novel tree-search algorithm through the shape space, which allows for much of the shape space to be explored with very little effort. This latter property makes the algorithm effective in highly cluttered scenes, as is demonstrated in an experimental comparison with a condensation tracker.
6. Shape from Shading Using The Level Set Approach
Ronnie Kimmel, Kaleem Siddiqi, Benjamin B. Kimia, and Alfred M. Bruckstein. Shape from shading : Level set propagation and viscosity solutions. IJCV, 16(2), October 1995.
7.
![]()
Model-based Reconstruction from CT View Data
Despite the significant role of
geometry in the image formation process and the need for its recovery, the
traditional approaches to computerized
tomography construct intensity images from X-ray measurement without an explicit
notion of geometry. Ray attenuations are represented as a
sinogram parameterized in the viewing angle an distance and reconstructed
via a variety of methods. The most popular of these is filtered
backprojection which, by an application of the central slice theorem, first
filters the measured data for each viewing angle and then cumulatively
projects it back into the image. Since the ideal filter cannot be realized,
finite energy approximations have been developed. However, this leads to a
blurring of the image data. While working directly in the measurement space,
whenever possible, would avoid this artifact, the ultimate solution is to
introduce geometry directly into the estimation procedure.
A second aspect of the traditional approach which needs to be re-examined is the
discretization of space into "voxels", for which reconstruction
algorithms report an average value. This averaging leads to blurring when
multiple structures are sampled by a voxel, e.g., near a boundary,
the well-known partial volume effect. Observe that if the voxels were to be
reshaped so that voxel boundaries would be coincident with anatomical
structure, such a blurring would not occur. However, such a reconfiguration of
the voxels, requires a priori knowledge of the anatomy, a chicken-and-egg
problem! We hypothesize that a simultaneous estimation of underlying geometry
and intensity would substantially improve reconstruction results. Specifically,
the use of geometric models in the reconstruction process avoids both of the
above difficulties. First, the use of models generally reduces the number of
parameters to be estimated, thus leveraging the information in the measurement
ray. Second, the use of models prevents partial volume effects since "voxel"
boundaries are matched with the anatomy.
The use of models, however, has two potential drawbacks. First, since the space
of models is to be matched with the underlying normative anatomy, it
could be argued that the useful regularization derived in the use of models can
potentially also miss estimating pathological anatomies. In our
experience with the use of models, deviations from models has typically lead to
large errors which can highlight regions which a radiologist should
closely examine. Second, there is a fundamental combinatorial problem in the use
of models: voxels are generically placed in a regular rectangular array for all
types of images. However, the placement of more sophisticated geometric models
faces combinatorial explosion. We plan to bootstrap the simultaneous estimation
of combinations of models and their geometric arrangement by resorting to local
estimates from the raw measurement data, which restricts the number of possible
arrangements. In this regard, the selection of the lung as an initial study has
several advantages. First, lung vessel anatomy is complicated only in the
connectivity of various segments, while each vessel segment can be approximated
by a rather simple cylindrical geometry. Second, we are able to work directly
with raw (not reconstructed) data due to the formulation proposed here. Third,
one of the background elements, air, stands in sharp contrast (in Hounsfeld
units) to the remaining tissue. Fourth, the vessel tree and bronchial trees
follow each other closely, thus providing a measure of anatomical validity.
X. Battle, G. Cunningham, and K. Hanson.
Tomographic reconstruction using 3{D} deformable models. Phys. Med. Biol.,
43:983--990, 1998.
J. G. Brankov, Y. Yang, and M. N. Wernick.
Tomographic image reconstruction using content-adaptive mesh modeling. IEEE
ICIP, pages 7--10, Oct. 2001.
8. Stereo
Multi-view Stereo Beyond Lambert
We consider the problem of estimating the shape and radiance of an object from a calibrated set of views under the assumption that the reflectance of the object is non-Lambertian. Unlike traditional stereo, we do not solve the correspondence problem by comparing image-to-image. Instead, we exploit a rank constraint on the radiance tensor field of the surface in space, and use it to define a discrepancy measure between each image and the underlying model. Our approach automatically returns an estimate of the radiance of the scene, along with its shape, represented by a dense surface. The former can be used to generate novel views that capture the non-Lambertian appearance of the scene.
9. Normalized Cuts and Image Segmentation
http://citeseer.nj.nec.com/shi97normalized.html
We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total...
Paper: http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf
10.
Image Height Ridge
Detection
D. Eberly, "Ridges in image and data analysis," in Computational Imaging and Vision. Dordrecht, The Netherlands: Kluwer Academic, 1996, vol. 7.
Combinatorial Classification of Pixels for Ridge Extraction in a Gray-scale Fingerprint Image
Paper: http://citeseer.nj.nec.com/553141.html
R. Haralick. Ridges and Valleys on Digital Images. Comput. Vis. Graph. Imag. Process., vol. 22, pages 28-38, 1983.
11. Shape-Based Compression
Progressive Content-Based Shape
Compression for Retrieval of Binary Images
Corinne Le Buhan Jordan, Touradj Ebrahimi and Murat Kunt
Computer Vision and Image Understanding
Volume 71, Issue 2
This paper deals with content-based compression of binary-shape images. The proposed method is based on a polygonal approximation of the shape contours. A well-known approximation algorithm, from computer vision applications such as shape analysis and boundary pattern matching, is adapted to achieve a progressive representation. The resulting various levels of shape quality are encoded, from a coarse representation for fast browsing up to a lossless representation for final rendering. In order to perform efficient compression of the progressive shape information, discrete geometrical constraints inherent to the image grid quantization are exploited. While the proposed scheme offers a content-based description (shape boundary as opposed to bitmap blocks) together with a quality scalable representation, it remains comparable, in terms of compression efficiency, with state of the art shape coding methods that do not combine such functionalities.
12.
Texture based segmentation
Segmentation of Textured Images
http://www-dbv.informatik.uni-bonn.de/image/segmentation.html
The unsupervised segmentation of
textured images is a difficult and challenging low level vision problem with
important applications in vision-guided autonomous robotics, product quality
inspection, medical diagnosis and in the analysis of remotely sensed images.
Algorithms for subsequent image processing stages like motion analysis and
tracking, stereo vision, object recognition and scene interpretation often rely
on a high quality image segmentation.
The segmentation problem can be informally described as the task of partitioning
an image into homogeneous regions. For textured images one of the main
conceptual difficulties is the definition of a homogeneity measure in
mathematical terms.The segmentation problem can be informally described as the
task of partitioning an image into homogeneous regions. For textured images one
of the main conceptual difficulties is the definition of a homogeneity measure
in mathematical terms. Our approach to unsupervised texture segmentation is
based on four cascaded design decisions, concerning the questions of image
representation, texture homogeneity, objective functions and optimization
procedures.
Realistic Textures for Virtual
Anastylosis
Alexey Zalesny, Dominik Auf der Maur, Rupert Paget, Maarten Vergauwen and Luc
Van Gool
http://www.lems.brown.edu/vision/conferences/ACVA03/ACVA03.html
See some cool results here:
http://www.vision.ee.ethz.ch/~rpaget/texture.htm
13. Texture Synthesis
See http://www.vision.ee.ethz.ch/~zales/
Alexey Zalesny, Vittorio Ferrari, Geert Caenen, and Luc Van Gool, "Parallel Composite Texture Synthesis", Texture 2002 Workshop in conjunction with ECCV 2002, pp. 151-155.
Geert Caenen, Vittorio Ferrari, Alexey Zalesny, and Luc Van Gool,
"Analyzing the layout of composite textures", Texture 2002 Workshop in
conjunction with ECCV 2002, pp. 15-19.
Alexey Zalesny, Vittorio Ferrari, Geert Caenen, Dominik Auf der Maur, and Luc
Van Gool,
"Composite Texture Descriptions", ECCV 2002, Vol. 3, pp. 180-194.