AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (6.2 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

Facial optical flow estimation via neural non-rigid registration

School of Mathematical Sciences, University of Science and Technology of China, Hefei 230026, China
Show Author Information

Graphical Abstract

Abstract

Optical flow estimation in human facial video, which provides 2D correspondences between adjacent frames, is a fundamental pre-processing step for many applications, like facial expression capture and recognition. However, it is quite challenging as human facial images contain large areas of similar textures, rich expressions, and large rotations. These characteristics also result in the scarcity of large, annotated real-world datasets. We propose a robust and accurate method to learn facial optical flow in a self-supervised manner. Specifically, we utilize various shape priors, including face depth, landmarks, and parsing, to guide the self-supervised learning task via a differentiable non-rigid registration framework. Extensive experiments demonstrate that our method achieves remarkable improvements for facial optical flow estimation in the presence of significant expressions and large rotations.

References

[1]
Garg, R.; Roussos, A.; Agapito, L. Dense variational reconstruction of non-rigid surfaces from monocular video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1272–1279, 2013.
[2]
Koujan, M. R.; Roussos, A. Combining dense nonrigid structure from motion and 3D morphable models for monocular 4D face reconstruction. In: Proceedings of the 15th ACM SIGGRAPH European Conference on Visual Media Production, Article No. 2, 2018.
[3]
Wang, S.; Shen, X.; Liu, J. Dense optical flow variation based 3D face reconstruction from monocular video. In: Proceedings of the 25th IEEE International Conference on Image Processing, 2665–2669, 2018.
[4]
Sidhu, V.; Tretschk, E.; Golyanik, V.; Agudo, A.; Theobalt, C. Neural dense non-rigid structure from motion with latent space constraints. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12361. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 204–222, 2020.
[5]
Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, 568–576, 2014.
[6]
Koujan, M. R.; Alharbawee, L.; Giannakakis, G.; Pugeault, N.; Roussos, A. Real-time facial expression recognition “in the wild” by disentangling 3D expression from identity. In: Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition, 24–31, 2020.
[7]
Xing, J. B.; Hu, W. B.; Zhang, Y. C.; Wong, T. T. Flow-aware synthesis: A generic motion model for video frame interpolation. Computational Visual Media Vol. 7, No. 3, 393–405, 2021.
[8]
Hurlburt, N.; Jaffey, S. A spectral optical flow method for determining velocities from digital imagery. Earth Science Informatics Vol. 8, No. 4, 959–965, 2015.
[9]
Boykov, Y.; Veksler, O.; Zabih, R. Markov random fields with efficient approximations. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 648–655, 1998.
[10]
Li, S. Z. Markov random field models in computer vision. In: Computer Vision — ECCV ’94. Lecture Notes in Computer Science, Vol. 801. Eklundh, J. O. Eds. Springer Berlin, 361–370, 1994.
[11]
Garg, R.; Roussos, A.; Agapito, L. A variational approach to video registration with subspace constraints. International Journal of Computer Vision Vol. 104, No. 3, 286–314, 2013.
[12]
Dosovitskiy, A.; Fischer, P.; Ilg, E.; Hausser, P.; Hazirbas, C.; Golkov, V.; Smagt, P. V. D.; Cremers, D.; Brox, T. FlowNet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2758–2766, 2015.
[13]
Sun, D.; Yang, X.; Liu, M.-Y.; Kautz, J. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8934–8943, 2018.
[14]
Teed, Z.; Deng, J. RAFT: Recurrent all-pairs field transforms for optical flow. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12347. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 402–419, 2020.
[15]
Xu, H.; Zhang, J. AANet: Adaptive aggregation network for efficient stereo matching. In: Proceedingsof the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1956–1965, 2020.
[16]
Meister, S.; Hur, J.; Roth, S. UnFlow: Unsupervised learning of optical flow with a bidirectional census loss. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 32, No. 1, 7251–7259, 2018.
[17]
Liu, P.; Lyu, M.; King, I.; Xu, J. SelFlow: Self-supervised learning of optical flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4566–4575, 2019.
[18]
Liu, P. P.; King, I.; Lyu, M. R.; Xu, J. DDFlow: Learning optical flow with unlabeled data distillation. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence, 8770–8777, 2019.
[19]
Koujan, M. R.; Roussos, A.; Zafeiriou, S. DeepFace-Flow: In-the-wild dense 3D facial motion estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6617–6626, 2020.
[20]
Blanz, V.; Vetter, T. A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 187–194, 1999.
[21]
Božič A.; Palafox, P.; Zollhöfer, M.; Dai, A.; Thies, J.; Nießner, M. Neural non-rigid tracking. In: Proceedings of the 34th International Conference on Neural Information Processing System, 18727–18737, 2020.
[22]
Guo, Y. D.; Cai, L.; Zhang, J. Y. 3D face from X: Learning face shape from diverse sources. IEEE Transactions on Image Processing Vol. 30, 3815–3827, 2021.
[23]
Bai, M.; Luo, W.; Kundu, K.; Urtasun, R. Exploiting semantic information and deep matching for optical flow. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9910. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 154–170, 2016.
[24]
Han, X. F.; Leung, T.; Jia, Y. Q.; Sukthankar, R.; Berg, A. C. MatchNet: Unifying feature and metric learning for patch-based matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3279–3286, 2015.
[25]
Simo-Serra, E.; Trulls, E.; Ferraz, L.; Kokkinos, I.; Fua, P.; Moreno-Noguer, F. Discriminative learning of deep convolutional feature point descriptors. In: Proceedings of the IEEE International Conference on Computer Vision, 118–126, 2015.
[26]
Zagoruyko, S.; Komodakis, N. Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4353–4361, 2015.
[27]
Ranjan, A.; Black, M. J. Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2720–2729, 2017.
[28]
Ilg, E.; Mayer, N.; Saikia, T.; Keuper, M.; Dosovitskiy, A.; Brox, T. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1647–1655, 2017.
[29]
Baker, S.; Scharstein, D.; Lewis, J. P.; Roth, S.; Black, M. J.; Szeliski, R. A database and evaluation methodology for optical flow. International Journal of Computer Vision Vol. 92, No. 1, 1–31, 2011.
[30]
Butler, D. J.; Wulff, J.; Stanley, G. B.; Black, M. J. A naturalistic open source movie for optical flow evaluation. In: Computer Vision – ECCV 2012. Lecture Notes in Computer Science, Vol. 7577. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 611–625, 2012.
[31]
Mayer, N.; Ilg, E.; Hausser, P.; Fischer, P.; Cremers, D.; Dosovitskiy, A.; Brox, T. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4040–4048, 2016.
[32]
Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3354–3361, 2012.
[33]
Menze, M.; Geiger, A. Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3061–3070, 2015.
[34]
Yu, J. J.; Harley, A. W.; Derpanis, K. G. Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In: Computer Vision – ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9915. Hua, G.; Jégou, H. Eds. Springer Cham, 3–10, 2016.
[35]
Revaud, J.; Weinzaepfel, P.; Harchaoui, Z.; Schmid, C. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1164–1172, 2015.
[36]
Zhu, Y.; Newsam, S. DenseNet for dense flow. In: Proceedings of the IEEE International Conference on Image Processing, 790–794, 2017.
[37]
Ren, Z.; Yan, J. C.; Ni, B. B.; Liu, B.; Yang, X. K.; Zha, H. Y. Unsupervised deep learning for optical flow estimation. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 31, No. 1, 1495–1501, 2017.
[38]
Wang, Y.; Yang, Y.; Yang, Z.; Zhao, L.; Wang, P.; Xu, W. Occlusion aware unsupervised learning of optical flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4884–4893, 2018.
[39]
Ranjan, A.; Romero, J.; Black, M. J. Learning human optical flow. In: Proceedings of the British Machine Vision Conference, 297, 2018.
[40]
Ranjan, A.; Hoffmann, D. T.; Tzionas, D.; Tang, S. Y.; Romero, J.; Black, M. J. Learning multi-human optical flow. International Journal of Computer Vision Vol. 128, No. 4, 873–890, 2020.
[41]
Sumner, R. W.; Schmid, J.; Pauly, M. Embedded deformation for shape manipulation. ACM Transactions on Graphics Vol. 26, No. 3, 80–es, 2007.
[42]
Yao, Y. X.; Deng, B. L.; Xu, W. W.; Zhang, J. Y. Quasi-Newton solver for robust non-rigid registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7597–7606, 2020.
[43]
Yu, T.; Guo, K. W.; Xu, F.; Dong, Y.; Su, Z. Q.; Zhao, J. H.; Li, J. G.; Dai, Q. H.; Liu, Y. B. BodyFusion: Real-time capture of human motion and surface geometry using a single depth camera. In: Proceedings of the IEEE International Conference on Computer Vision, 910–919, 2017.
[44]
Lee, C. H.; Liu, Z. W.; Wu, L. Y.; Luo, P. MaskGAN: Towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition, 5548–5557, 2020.
[45]
Feng, W. Q.; Zhang, J. Y.; Cai, H. R.; Xu, H. F.; Hou, J. H.; Bao, H. J. Recurrent multi-view alignment network for unsupervised surface registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10292–10302, 2021.
[46]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing, 8026–8037, 2019.
[47]
Bottou, L. Stochastic gradient descent tricks. In: Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, Vol. 7700. Montavon, G.; Orr, G. B.; Müller, K. R. Eds. Springer Berlin Heidelberg, 421–436, 2012.
[48]
He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.
[49]
Yang, J.; Bulat, A.; Tzimiropoulos, G. FAN-face: A simple orthogonal improvement to deep face recognition. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 12621–12628, 2020.
Computational Visual Media
Pages 109-122
Cite this article:
Peng Z, Jiang B, Xu H, et al. Facial optical flow estimation via neural non-rigid registration. Computational Visual Media, 2023, 9(1): 109-122. https://doi.org/10.1007/s41095-021-0267-z

1032

Views

81

Downloads

11

Crossref

3

Web of Science

8

Scopus

0

CSCD

Altmetrics

Received: 03 August 2021
Accepted: 27 December 2021
Published: 18 October 2022
© The Author(s) 2022.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return