previous up next
Next: Face Tracking Results Up: A Virtual Mirror Interface Previous: Multi-modal head tracking

Subsections


A Virtual Mirror Display


Our initial application of our integrated, multi-modal visual person tracking framework is to create an interactive visual experience. We have created a virtual mirror which distorts and exaggerates the facial expression of people observing the device.

We create a virtual mirror by placing cameras so that they share the same optical axis as a video display, using a half-silvered mirror to merge the two optical paths. Since we are using stereo processing, we use multiple cameras to observe the user: the primary color camera is mounted in the center of the imaging frame and an additional camera is mounted off-axis. The cameras view the user through a right-angle half-silvered mirror, so that the user can view a video monitor while also looking straight into (but not seeing) the cameras. Video from the primary camera is displayed on the monitor, after the various distortion effects described below, so as to create a virtual mirror effect. Figure 3 shows the display and viewing geometry of our apparatus. With an estimate of the position of the user's head in 3-D from the tracking system, graphics techniques to distort and/or morph the shape or apparent material properties of the user's face can be applied; this creates a novel and entertaining interactive visual experience.

Interactive facial distortion has been explored before on static imagery (such as the software imaging tool ``Kai's Power Goo'' by Metatools, Inc.). Performing the effect in video is qualitatively different from still image processing in terms of the entertaining quality of the device. The live image of one's face evokes a quality of being connected and disconnected at the same time; by distorting that face in real-time, we create a self-referential experience with an image that is clearly neither oneself, nor is it entirely synthetic or autonomous. Users seem to find the effect entertaining and interesting, and are willing to make quite exaggerated expressions to see how they appear in distorted form.

Graphics Processing


Video texture mapping techniques[5] are used to implement the distortion of the user's face. For this discussion we assume that texture and position coordinates are both normalized to be over [0,1]. We define a vertex to be in ``canonical coordinates'' when position and texture coordinates are identical. To construct our display, a background rectangle is set to cover the display (from 0,0 to 1,1) in canonical coordinates. This alone creates a display which is equivalent to a non-distorted, pass-through, video window. To perform face distortions, a smaller mesh is defined over the region of the user's head. Within the external contour of the head region, vertices are placed optionally at the contour boundary as well as at evenly sampled interior points. Initially all vertices are placed in canonical coordinates, and set to have neutral base color.

Color distortions may be effected by manipulating the base color of each vertex. Shape distortions are applied in one of two modes: parametric or physically-based. In the parametric mode distortions are performed by adding a deformation vector to each vertex position, expressed as a weighted sum of fixed basis deformations. In our application these bases are constructed so as to keep the borders of the distortion region in approximately canonical coordinates, so that there will be no apparent seams to the video effect. In the physically-based mode forces can be applied to each vertex and position changes are computed using an approximation to an elastic surface; a vertex can be "pulled" in a given direction, and the entire mesh will deform as if it were a rubber sheet.

The weight parameters associated with parametric basis deformations vary over time, and can be expressed as a function of several relevant variables describing the state of the user: the distance of the user to the screen; their position on the floor in front of the display, or their overall body pose. In addition the weight parameters can vary randomly, or according to a script or external control. Forces for the physically-based model can be input either with an external interface, randomly, or directly in the image as the user's face touches other objects or body parts.


 
Figure:   Typical scenes seen in the virtual mirror.
\begin{figure*}
\centerline{

\psfig {figure=image3la.ps,width=2in}

\psfig {figure=image4la.ps,width=2in}

\psfig {figure=image7la.ps,width=2in}

}\end{figure*}

Implementation Details


We implemented our system for SIGGRAPH'97 using three computer systems (one PC, two SGI O2), a large NTSC video monitor, stereo video cameras, a dedicated stereo computation PC board, and an optical half-mirror. The monitor, mirror, and cameras are arranged such that the camera and monitor share the same optical axis: the user can stare into the camera and display simultaneously, but sees only the monitor output. Depth estimates are computed on the stereo PC board based on input from the stereo cameras, which is sent over a network from the PC to the first SGI at approx. 20Hz for 128x128 range maps. On this SGI color video is digitized at 640x480 and used as a texture source for the distortion effect. Skin color lookup and connected components analysis is performed at 20Hz at 128x128 resolution.

The color segmentation classifier was trained across various lighting conditions at the demonstration site by taking images of a reference color sample grid, as well as images of people and background scenes.

A second SGI O2 performed face detection routines: at 128x128 resolution it takes approximately 0.8 seconds to find all faces in a typical scene.

The output image is constructed by applying the acquired video as a texture source for the background rectangle and the face mesh. The full system, including all vision and graphics processing, runs at approximately 12Hz.

For this demonstration four parametric deformations and one physically-based distortion were implemented: a spherical expansion, spherical shrinking, swirl, and a lateral expansion were defined as bases, and a vertical sliding effect implemented using simulated physics. Figure 4 shows the basic effects generated by our system.


previous up next
Next: Face Tracking Results Up: A Virtual Mirror Interface Previous: Multi-modal head tracking

T. Darrell, G. Gordon. J. Woodfill, M. Harville, "A Virtual Mirror Interface using Real-time Robust Face Tracking", Proceedings of the the Third International Conference on Face and Gesture Recognition, IEEE Computer Society Press, April 1998, Nara, Japan.