Plausible Physics in Augmented Images
> main > publications
Introduction
Augmented reality systems generally fall into one of two categories: those that require user input or prior information to model the world, and those that automatically model the world from images. The former can provide rich interaction between real and virtual objects at the expense of user effort to model the scene. The latter can render virtual objects into the images but usually do not account for any interaction with the world.
We present a system that automatically builds a 3D model of an environment from an unordered set of images. The model allows us to insert virtual objects into the scene and run real time plausible physics simulations between real and virtual objects. The scene model also accounts for occlusion of virtual objects and allows us to cast shadows of virtual objects onto the real objects in the scene.
Our approach is currently limited by the amount of texture in the scene. Texture results in a dense sampling of detected points, which improves depth estimation as well as plausible physics simulations.
Scene Modeling
Recent work by Iryna Skrypnyk and David Lowe[Skrypnyk2004] shows how a static scene can be modeled automatically given only images taken from several viewpoints. The scene is modeled by identifying salient image points via the Scale Invariant Feature Transform (SIFT). The 3D location of the feature points and the camera parameters used in image formation are simultaneously estimated using bundle adjustment. Skrypnyk and Lowe use this point cloud model to calibrate additional images in real time.
We extend this work in a different direction. Our focus is to improve the scene model so that we can create a real-time interactive environment viewed through augmentation of the original image set. After reconstructing a sparse point cloud [Figure 1(b)] we estimate a continuous depth map for each of the images [Figure 1(c)]. To this end, the subset of 3D points originally detected in each image is projected back into the image. This provides a set of 2D points with known depths. For each image we apply thin-plate interpolation of these depth values. The result is a smooth function of depth that both interpolates and extrapolates the data.
Plausible Physics and Interaction
To perform plausible physics we implemented a real-time simulation based upon the work done by Guendelman[Guendelman2003]. To account for interactions with our scene model we treat each reconstructed feature as infinite point mass. To simulate the force due to gravity in our augmented world we let the user determine the scale and orientation of the world. This removes the inherent ambiguity in the reconstruction. We also provide an interface for inserting and manipulating virtual objects in the scene. Users can assign initial positions and velocities to objects, view the scene from any of the viewpoints, and run simulations at various speeds. While the simulation does work in real-time, running at slower speeds (with shorter time steps) results in more accurate simulations.
Rendering
To run physics simulations in real time we must be able to render
the scene efficiently. Using OpenGL, we render the images using texture mapped
planes. We render the depth maps by sampling our depth function on a regular grid,
back projecting these points into 3D, and rending a triangular mesh of these points
into the depth buffer. Rendering the scene data in this way is much more efficient
that calling glDrawPixels. The number of vertices in the depth grid is
a free variable that controls trade-off between rendering speed and adherence to
depth function. Since the depth function is very smooth we are able to down-sample
considerably without noticeable loss in quality. The resulting depth maps play a
dual role. First, they allow
the real objects to occlude the virtual ones. Second, they allow shadows to be cast
from virtual objects onto the surfaces in the real scene. For the latter, we use shadow
volumes and make the assumption that half of each image's intensity comes from
ambient light while the rest comes from direct lighting.
References
- [Guendelman2003]
- GUENDELMAN, E., BRIDSON, R., AND FEDKIW, R. 2003. Nonconvex rigid bodies with stacking. ACM Trans. Graph. 22, 3, 871-878.
- [Skrypnyk2004]
- SKRYPNYK, I., AND LOWE, D. G. 2004. Scene modelling, recognition and tracking with invariant image features. In ISMAR, 110-119.
leotta_siggraph2005_poster.pdf
leotta_siggraph2005.pdf
Falling Balls Video
Interaction Video
Desk Interaction Video