Next: Summary
Up: A Virtual Mirror Interface
Previous: A Virtual Mirror Display
| Modules Enabled | Misses | False | Correct | Error | ||
| Color | Range | Pattern | Positives | Rate | ||
| 11 | 321 | 3.3% | ||||
| 12 | 320 | 3.6% | ||||
| 14 | 318 | 4.2% | ||||
| 18 | 314 | 5.4% | ||||
| 3 | 39 | 290 | 12.7% | |||
| 3 | 49 | 280 | 15.7% | |||
| 259 | 2 | 71 | 78.6% | |||
The goal of the visual tracking portion of our system is to identify the 3-D position and size of a user's head in the scene, so that distortion and other effects can be applied. We have analyzed the performance of our system, both with on-line tests with thousands of users and with off-line tests where we could quantitatively analyze the performance of the system with different modules enabled or disabled.
Our system was first demonstrated at SIGGRAPH'97 from Aug 3-8, 1997 [1], with an estimated 5000 people over 6 days experiencing our system (approx. two new users per minute, over 42 hours of operation). Qualitatively, the system was a complete success. Our tracking results were sufficient to localize video distortion effects that were interesting and fun for people to use. Figure 5 shows typical images displayed on the virtual mirror. The system performed well with both single users and crowded conditions; in an initial report of the system [2] we estimated that correct head placement occurred in 85-95% of cases we surveyed. In this version of the system the only module that was allowed to fail was the face detector.
As described above we can now produce estimates of face location using
any combination of modules. In addition to improving our overall
performance, this allows us to analyze the system under different
failure conditions, and assess the value of our integration strategy.
We stored range and color images at sparse intervals from our system
during the SIGGRAPH show; we analyzed performance off-line on those
which contained images of people. (Performance of our system on blank
scenes is in excess of
correct-no effect is generated.)
Table 1 summarizes the results we found. A correct match was
defined when the corners of estimated face region was sufficiently
close to manually entered ground truth (within
of the
face size.) Overall, when all modules were functioning we achieved
an error rate of 3.3%. When any one
module was allowed to fail, error was still less than 13%. In terms
of individual performance, the range module was best
(5.4% error), followed by the color module (15.7%), and the face module
(78.6%).
We draw two main conclusions from these data; first, that range data is a powerful cue to localizing heads in complex scenes, as is flesh color detection. Second, integration is useful: in every case, the addition of modules improved the system performance significantly. The addition of color or face detection to the range system reduced error by approximately 30%.
We note that the solo performance of the pattern based face detector presented here show significantly more errors than in previous analyses [9]. Several issues impacted these results. First, the CMU system was trained with primarily upright, frontal view faces, without extreme facial expressions. These expectations match a broad range of application scenarios. However, the distortions presented to our users encouraged them to make large head rotations and greatly exaggerated facial expressions. These poses, largely atypical of the training material, and were therefore not detected as probable face patterns. A second relevant factor was image resolution. Informal inspection of detection results indicated that many errors occurred when the face was less than 13 pixels across. While we cannot remedy this in the current dataset we anticipate that performance will improve when higher-resolution imagery are used as input to the face detector. Overall, however, in concert with skin color tracking and depth segmentation, the face detection module provides information essential for robust performance.
T. Darrell, G. Gordon. J. Woodfill, M. Harville, "A Virtual Mirror Interface using Real-time Robust Face Tracking", Proceedings of the the Third International Conference on Face and Gesture Recognition, IEEE Computer Society Press, April 1998, Nara, Japan.