Face views are particularly important in person-to-person communication. Differenes between the camera location and the face orientation can result in undesirable facial appearances of the participants during video conferencing. This phenomenon is par-ticularly noticeable when using devices where the front-facing camera is placed in unconventional locations such as below the display or within the keyboard. In this paper, we take a video stream from a single RGB camera as input, and generate a video stream that emulates the view from a virtual camera at a designated location. The most challenging issue in this problem is that the corrected view often needs out-of-plane head rotations. To address this challenge, we reconstruct the 3D face shape and re-render it into synthesized frames according to the virtual camera location. To output the corrected video stream with natural appearance in real time, we propose several novel techniques including accurate eyebrow reconstruction, high-quality blending between the corrected face image and background, and template-based 3D reconstruction of glasses. Our system works well for different lighting conditions and skin tones, and can handle users wearing glasses. Extensive experiments and user studies demonstrate that our method provides high-quality results.
- Article type
- Year
- Co-author
In order to conduct optical neurophysiology experiments on a freely swimming zebrafish, it is essential to quantify the zebrafish head to determine exact lighting positions. To efficiently quantify a zebrafish head's behaviors with limited resources, we propose a real-time multi-stage architecture based on convolutional neural networks for pose estimation of the zebrafish head on CPUs. Each stage is implemented with a small neural network. Specifically, a light-weight object detector named Micro-YOLO is used to detect a coarse region of the zebrafish head in the first stage. In the second stage, a tiny bounding box refinement network is devised to produce a high-quality bounding box around the zebrafish head. Finally, a small pose estimation network named tiny-hourglass is designed to detect keypoints in the zebrafish head. The experimental results show that using Micro-YOLO combined with RegressNet to predict the zebrafish head region is not only more accurate but also much faster than Faster R-CNN which is the representative of two-stage detectors. Compared with DeepLabCut, a state-of-the-art method to estimate poses for user-defined body parts, our multi-stage architecture can achieve a higher accuracy, and runs 19x faster than it on CPUs.