AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (7.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

Robust facial landmark detection and tracking across poses and expressions for in-the-wild monocular video

Bournemouth University, Poole, BH12 5BB, UK.
Harbin Institute of Technology, Harbin, 150001, China.
Show Author Information

Abstract

We present a novel approach for automatically detecting and tracking facial landmarks across poses and expressions from in-the-wild monocular video data, e.g., YouTube videos and smartphone recordings. Our method does not require any calibration or manual adjustment for new individual input videos or actors. Firstly, we propose a method of robust 2D facial landmark detection across poses, by combining shape-face canonical-correlation analysis with a global supervised descent method. Since 2D regression-based methods are sensitive to unstable initialization, and the temporal and spatial coherence of videos is ignored, we utilize a coarse-to-dense 3D facial expression reconstruction method to refine the 2D landmarks. On one side, we employ an in-the-wild method to extract the coarse reconstruction result and its corresponding texture using the detected sparse facial landmarks, followed by robust pose, expression, and identity estimation. On the other side, to obtain dense reconstruction results, we give a face tracking flow method that corrects coarse reconstruction results and tracks weakly textured areas; this is used to iteratively update the coarse face model. Finally, a dense reconstruction result is estimated after it converges. Extensive experiments on a variety of video sequences recorded by ourselves or downloaded from YouTube show the results of facial landmark detection and tracking under various lighting conditions, for various head poses and facial expressions. The overall performance and a comparison with state-of-art methods demonstrate the robustness and effectiveness of our method.

Electronic Supplementary Material

Video
41095_2016_68_MOESM1_ESM.mp4

References

[1]
Mori, M.; MacDorman, K. F.; Kageki, N. The uncanny valley [from the field]. IEEE Robotics & Automation Magazine Vol. 19, No. 2, 98-100, 2012.
[2]
Cootes, T. F.; Taylor, C. J.; Cooper, D. H.; Graham, J. Active shape models—Their training and application. Computer Vision and Image Understanding Vol. 61, No. 1, 38-59, 1995.
[3]
Cootes, T. F.; Edwards, G. J.; Taylor, C. J. Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 23, No. 6, 681-685, 2001.
[4]
Cristinacce, D.; Cootes, T. F. Feature detection and tracking with constrained local models. In: Proceedings of the British Machine Conference, 95.1-95.10, 2006.
[5]
Gonzalez-Mora, J.; De la Torre, F.; Murthi, R.; Guil, N.; Zapata, E. L. Bilinear active appearance models. In: Proceedings of IEEE 11th International Conference on Computer Vision, 1-8, 2007.
[6]
Lee, H.-S.; Kim, D. Tensor-based AAM with continuous variation estimation: Application to variation-robust face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 31, No. 6, 1102-1116, 2009.
[7]
Cao, X.; Wei, Y.; Wen, F.; Sun, J. Face alignment by explicit shape regression. U.S. Patent Application 13/728,584. 2012-12-27.
[8]
Xiong, X.; De la Torre, F. Supervised descent method and its applications to face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 532-539, 2013.
[9]
Xing, J.; Niu, Z.; Huang, J.; Hu, W.; Yan, S. Towards multi-view and partially-occluded face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1829-1836, 2014.
[10]
Yan, J.; Lei, Z.; Yi, D.; Li, S. Z. Learn to combine multiple hypotheses for accurate face alignment. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 392-396, 2013.
[11]
Burgos-Artizzu, X. P.; Perona, P.; Dollár, P. Robust face landmark estimation under occlusion. In: Proceedings of the IEEE International Conference on Computer Vision, 1513-1520, 2013.
[12]
Yang, H.; He, X.; Jia, X.; Patras, I. Robust face alignment under occlusion via regional predictive power estimation. IEEE Transactions on Image Processing Vol. 24, No. 8, 2393-2403, 2015.
[13]
Feng, Z.-H.; Huber, P.; Kittler, J.; Christmas, W.; Wu, X.-J. Random cascaded-regression copse for robust facial landmark detection. IEEE Signal Processing Letters Vol. 22, No. 1, 76-80, 2015.
[14]
Yang, H.; Jia, X.; Patras, I.; Chan, K.-P. Random subspace supervised descent method for regression problems in computer vision. IEEE Signal Processing Letters Vol. 22, No. 10, 1816-1820, 2015.
[15]
Zhu, S.; Li, C.; Loy, C. C.; Tang, X. Face alignment by coarse-to-fine shape searching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4998-5006, 2015.
[16]
Cao, C.; Hou, Q.; Zhou, K. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Transactions on Graphics Vol. 33, No. 4, Article No. 43, 2014.
[17]
Liu, S.; Yang, X.; Wang, Z.; Xiao, Z.; Zhang, J. Real-time facial expression transfer with single video camera. Computer Animation and Virtual Worlds Vol. 27, Nos. 3–4, 301-310, 2016.
[18]
Tzimiropoulos, G.; Pantic, M. Optimization problems for fast AAM fitting in-the-wild. In: Proceedings of the IEEE International Conference on Computer Vision, 593-600, 2013.
[19]
Suwajanakorn, S.; Kemelmacher-Shlizerman, I.; Seitz, S. M. Total moving face reconstruction. In: Computer Vision–ECCV 2014. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer International Publishing, 796-812, 2014.
[20]
Cootes, T. F.; Taylor, C. J. Statistical models of appearance for computer vision. 2004. Available at http://personalpages.manchester.ac.uk/staff/timothy.f.cootes/Models/app_models.pdf.
[21]
Yan, S.; Liu, C.; Li, S. Z.; Zhang, H.; Shum, H.-Y.; Cheng, Q. Face alignment using texture-constrained active shape models. Image and Vision Computing Vol. 21, No. 1, 69-75, 2003.
[22]
Donner, R.; Reiter, M.; Langs, G.; Peloschek, P.; Bischof, H. Fast active appearance model search using canonical correlation analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 28, No. 10, 1690-1694, 2006.
[23]
Matthews, I.; Baker, S. Active appearance models revisited. International Journal of Computer Vision Vol. 60, No. 2, 135-164, 2004.
[24]
Cao, X.; Wei, Y.; Wen, F.; Sun, J. Face alignment by explicit shape regression. International Journal of Computer Vision Vol. 107, No. 2, 177-190, 2014.
[25]
Dollár, P.; Welinder, P.; Perona, P. Cascaded pose regression. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1078-1085, 2010.
[26]
Zhou, S. K.; Comaniciu, D. Shape regression machine. In: Information Processing in Medical Imaging. Karssemeijer, N.; Lelieveldt, B. Eds. Springer Berlin Heidelberg, 13-25, 2007.
[27]
Burgos-Artizzu, X. P.; Perona, P.; Dollár, P. Robust face landmark estimation under occlusion. In: Proceedings of the IEEE International Conference on Computer Vision, 1513-1520, 2013.
[28]
Ren, S.; Cao, X.; Wei, Y.; Sun, J. Face alignment at 3000 fps via regressing local binary features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1685-1692, 2014.
[29]
Cootes, T. F.; Ionita, M. C.; Lindner, C.; Sauer, P. Robust and accurate shape model fitting using random forest regression voting. In: Computer Vision–ECCV 2012. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 278-291, 2012.
[30]
Kazemi, V.; Sullivan, J. One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1867-1874, 2014.
[31]
Sagonas, C.; Tzimiropoulos, G.; Zafeiriou, S.; Pantic, M. 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 397-403, 2013.
[32]
Zhou, F.; Brandt, J.; Lin, Z. Exemplar-based graph matching for robust facial landmark localization. In: Proceedings of the IEEE International Conference on Computer Vision, 1025-1032, 2013.
[33]
Huang, G. B.; Ramesh, M.; Berg, T.; Learned-Miller, E. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, 2007.
[34]
Shen, J.; Zafeiriou, S.; Chrysos, G. G.; Kossaifi, J.; Tzimiropoulos, G.; Pantic, M. The first facial landmark tracking in-the-wild challenge: Benchmark and results. In: Proceedings of the IEEE International Conference on Computer Vision Workshop, 1003-1011, 2015.
[35]
Cao, C.; Bradley, D.; Zhou, K.; Beeler, T. Realtime high-fidelity facial performance capture. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 46, 2015.
[36]
Cao, C.; Wu, H.; Weng, Y.; Shao, T.; Zhou, K. Real-time facial animation with image-based dynamic avatars. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 126, 2016.
[37]
Garrido, P.; Valgaerts, L.; Wu, C.; Theobalt, C. Reconstructing detailed dynamic face geometry from monocular video. ACM Transactions on Graphics Vol. 32, No. 6, Article No. 158, 2013.
[38]
Ichim, A. E.; Bouaziz, S.; Pauly, M. Dynamic 3D avatar creation from hand-held video input. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 45, 2015.
[39]
Saito, S.; Li, T.; Li, H. Real-time facial segmentation and performance capture from RGB input. arXiv preprint arXiv:1604.02647, 2016.
[40]
Shi, F.; Wu, H.-T.; Tong, X.; Chai, J. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Transactions on Graphics Vol. 33, No. 6, Article No. 222, 2014.
[41]
Thies, J.; Zollhöfer, M.; Stamminger, M.; Theobalt, C.; Nießner, M. Face2face: Real-time face capture and reenactment of RGB videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1, 2016.
[42]
Furukawa, Y.; Ponce, J. Accurate camera calibration from multi-view stereo and bundle adjustment. International Journal of Computer Vision Vol. 84, No. 3, 257-268, 2009.
[43]
Cao, C.; Weng, Y.; Zhou, S.; Tong, Y.; Zhou, K. FaceWarehouse: A 3D facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics Vol. 20, No. 3, 413-425, 2014.
[44]
Newcombe, R. A.; Izadi, S.; Hilliges, O.; Molyneaux, D.; Kim, D.; Davison, A. J.; Kohi, P.; Shotton, J.; Hodges, S.; Fitzgibbon, A. KinectFusion: Realtime dense surface mapping and tracking. In: Proceedings of the 10th IEEE International Symposium on Mixed and Augmented Reality, 127-136, 2011.
[45]
Weise, T.; Bouaziz, S.; Li, H.; Pauly, M. Realtime performance-based facial animation. ACM Transactions on Graphics Vol. 30, No. 4, Article No. 77, 2011.
[46]
Blanz, V.; Vetter, T. A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 187-194, 1999.
[47]
Yan, J.; Zhang, X.; Lei, Z.; Yi, D.; Li, S. Z. Structural models for face detection. In: Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 1-6, 2013.
[48]
Xiong, X.; De la Torre, F. Global supervised descent method. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2664-2673, 2015.
[49]
Snavely, N. Bundler: Structure from motion (SFM) for unordered image collections. 2010. Available at http://www.cs.cornell.edu/~snavely/bundler/.
[50]
Chen, L.; Armstrong, C. W.; Raftopoulos, D. D. An investigation on the accuracy of three-dimensional space reconstruction using the direct linear transformation technique. Journal of Biomechanics Vol. 27, No. 4, 493-500, 1994.
[51]
Moré, J. J. The Levenberg–Marquardt algorithm: Implementation and theory. In: Numerical Analysis. Watson, G. A. Ed. Springer Berlin Heidelberg, 105-116, 1978.
[52]
Rall, L. B. Automatic Differentiation: Techniques and Applications. Springer Berlin Heidelberg, 1981.
[53]
Kolda, T. G.; Sun, J.  Scalable tensor decompositions for multi-aspect data mining. In: Proceedings of the 8th IEEE International Conference on Data Mining, 363-372, 2008.10.1109/ICDM.2008.89
[54]
Li, D.-H.; Fukushima, M. A modified BFGS method and its global convergence in nonconvex minimization. Journal of Computational and Applied Mathematics Vol. 129, Nos. 1–2, 15-35, 2001.
[55]
Igarashi, T.; Moscovich, T.; Hughes, J. F. As-rigid-as-possible shape manipulation. ACM Transactions on Graphics Vol. 24, No. 3, 1134-1141, 2005.
[56]
Hartigan, J. A.; Wong, M. A. Algorithm AS 136: A K-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) Vol. 28, No. 1, 100-108, 1979.
[57]
Brox, T.; Bruhn, A.; Papenberg, N.; Weickert, J. High accuracy optical flow estimation based on a theory for warping. In: Computer Vision–ECCV 2004. Pajdla, T.; Matas, J. Eds. Springer Berlin Heidelberg, 25-36, 2004.
[58]
Brox, T.; Malik, J. Large displacement optical flow: Descriptor matching in variational motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 33, No. 3, 500-513, 2011.
[59]
Agarwal, S.; Snavely, N.; Seitz, S. M.; Szeliski, R. Bundle adjustment in the large. In: Computer Vision–ECCV 2010. Daniilidis, K.; Maragos, P.; Paragios, N. Eds. Springer Berlin Heidelberg, 29-42, 2010.
[60]
Belhumeur, P. N.; Jacobs, D. W.; Kriegman, D. J.; Kumar, N. Localizing parts of faces using a consensus of exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 35, No. 12, 2930-2940, 2013.
Computational Visual Media
Pages 33-47
Cite this article:
Liu S, Zhang Y, Yang X, et al. Robust facial landmark detection and tracking across poses and expressions for in-the-wild monocular video. Computational Visual Media, 2017, 3(1): 33-47. https://doi.org/10.1007/s41095-016-0068-y

796

Views

30

Downloads

10

Crossref

N/A

Web of Science

10

Scopus

1

CSCD

Altmetrics

Revised: 04 September 2016
Accepted: 20 December 2016
Published: 17 March 2017
© The Author(s) 2016

This article is published with open access at Springerlink.com

The articles published in this journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.

Return