Facial optical flow estimation via neural non-rigid registration

Zhuang Peng; Boyi Jiang; Haofei Xu; Wanquan Feng; Juyong Zhang

doi:10.1007/s41095-021-0267-z

| Sign up

PDF (6.2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Research Article | Open Access

Facial optical flow estimation via neural non-rigid registration

Zhuang Peng^¹, Boyi Jiang^¹, Haofei Xu^¹, Wanquan Feng^¹, Juyong Zhang^¹()

1School of Mathematical Sciences, University of Science and Technology of China, Hefei 230026, China

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

Optical flow estimation in human facial video, which provides 2D correspondences between adjacent frames, is a fundamental pre-processing step for many applications, like facial expression capture and recognition. However, it is quite challenging as human facial images contain large areas of similar textures, rich expressions, and large rotations. These characteristics also result in the scarcity of large, annotated real-world datasets. We propose a robust and accurate method to learn facial optical flow in a self-supervised manner. Specifically, we utilize various shape priors, including face depth, landmarks, and parsing, to guide the self-supervised learning task via a differentiable non-rigid registration framework. Extensive experiments demonstrate that our method achieves remarkable improvements for facial optical flow estimation in the presence of significant expressions and large rotations.

Keywords

human face optical flow self-supervised non-rigid registration neural networks facial priors

References

[1]

Garg,

; Roussos,

; Agapito,

Dense variational reconstruction of non-rigid surfaces from monocular video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1272–1279, 2013.

Crossref

[2]

Koujan,

M. R.

; Roussos,

Combining dense nonrigid structure from motion and 3D morphable models for monocular 4D face reconstruction. In: Proceedings of the 15th ACM SIGGRAPH European Conference on Visual Media Production, Article No. 2, 2018.

Crossref

[3]

Wang,

; Shen,

; Liu,

Dense optical flow variation based 3D face reconstruction from monocular video. In: Proceedings of the 25th IEEE International Conference on Image Processing, 2665–2669, 2018.

Crossref

[4]

Sidhu,

; Tretschk,

; Golyanik,

; Agudo,

; Theobalt,

Neural dense non-rigid structure from motion with latent space constraints. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12361. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 204–222, 2020.

Crossref

[5]

Simonyan,

; Zisserman,

Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, 568–576, 2014.

[6]

Koujan,

M. R.

; Alharbawee,

; Giannakakis,

; Pugeault,

; Roussos,

Real-time facial expression recognition “in the wild” by disentangling 3D expression from identity. In: Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition, 24–31, 2020.

Crossref

[7]

Xing,

J. B.

; Hu,

W. B.

; Zhang,

Y. C.

; Wong,

T. T.

Flow-aware synthesis: A generic motion model for video frame interpolation. Computational Visual Media Vol. 7, No. 3, 393–405, 2021.

Crossref Google Scholar

[8]

Hurlburt,

; Jaffey,

A spectral optical flow method for determining velocities from digital imagery. Earth Science Informatics Vol. 8, No. 4, 959–965, 2015.

Crossref Google Scholar

[9]

Boykov,

; Veksler,

; Zabih,

Markov random fields with efficient approximations. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 648–655, 1998.

[10]

Li,

S. Z.

Markov random field models in computer vision. In: Computer Vision — ECCV ’94. Lecture Notes in Computer Science, Vol. 801. Eklundh,

J. O.

Eds. Springer Berlin, 361–370, 1994.

[11]

Garg,

; Roussos,

; Agapito,

A variational approach to video registration with subspace constraints. International Journal of Computer Vision Vol. 104, No. 3, 286–314, 2013.

Crossref Google Scholar

[12]

Dosovitskiy,

; Fischer,

; Ilg,

; Hausser,

; Hazirbas,

; Golkov,

; Smagt,

P. V. D.

; Cremers,

; Brox,

FlowNet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2758–2766, 2015.

Crossref

[13]

Sun,

; Yang,

; Liu,

M.-Y.

; Kautz,

PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8934–8943, 2018.

Crossref

[14]

Teed,

; Deng,

RAFT: Recurrent all-pairs field transforms for optical flow. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12347. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 402–419, 2020.

[15]

Xu,

; Zhang,

AANet: Adaptive aggregation network for efficient stereo matching. In: Proceedingsof the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1956–1965, 2020.

Crossref

[16]

Meister,

; Hur,

; Roth,

UnFlow: Unsupervised learning of optical flow with a bidirectional census loss. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 32, No. 1, 7251–7259, 2018.

Crossref Google Scholar

[17]

Liu,

; Lyu,

; King,

; Xu,

SelFlow: Self-supervised learning of optical flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4566–4575, 2019.

Crossref

[18]

Liu,

P. P.

; King,

; Lyu,

M. R.

; Xu,

DDFlow: Learning optical flow with unlabeled data distillation. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence, 8770–8777, 2019.

Crossref

[19]

Koujan,

M. R.

; Roussos,

; Zafeiriou,

DeepFace-Flow: In-the-wild dense 3D facial motion estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6617–6626, 2020.

Crossref

[20]

Blanz,

; Vetter,

A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 187–194, 1999.

Crossref

[21]

Božič

; Palafox,

; Zollhöfer,

; Dai,

; Thies,

; Nießner,

Neural non-rigid tracking. In: Proceedings of the 34th International Conference on Neural Information Processing System, 18727–18737, 2020.

[22]

Guo,

Y. D.

; Cai,

; Zhang,

J. Y.

3D face from X: Learning face shape from diverse sources. IEEE Transactions on Image Processing Vol. 30, 3815–3827, 2021.

Crossref Google Scholar

[23]

Bai,

; Luo,

; Kundu,

; Urtasun,

Exploiting semantic information and deep matching for optical flow. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9910. Leibe,

; Matas,

; Sebe,

; Welling,

Eds. Springer Cham, 154–170, 2016.

Crossref

[24]

Han,

X. F.

; Leung,

; Jia,

Y. Q.

; Sukthankar,

; Berg,

A. C.

MatchNet: Unifying feature and metric learning for patch-based matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3279–3286, 2015.

[25]

Simo-Serra,

; Trulls,

; Ferraz,

; Kokkinos,

; Fua,

; Moreno-Noguer,

Discriminative learning of deep convolutional feature point descriptors. In: Proceedings of the IEEE International Conference on Computer Vision, 118–126, 2015.

Crossref

[26]

Zagoruyko,

; Komodakis,

Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4353–4361, 2015.

Crossref

[27]

Ranjan,

; Black,

M. J.

Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2720–2729, 2017.

Crossref

[28]

Ilg,

; Mayer,

; Saikia,

; Keuper,

; Dosovitskiy,

; Brox,

FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1647–1655, 2017.

Crossref

[29]

Baker,

; Scharstein,

; Lewis,

J. P.

; Roth,

; Black,

M. J.

; Szeliski,

A database and evaluation methodology for optical flow. International Journal of Computer Vision Vol. 92, No. 1, 1–31, 2011.

Crossref Google Scholar

[30]

Butler,

D. J.

; Wulff,

; Stanley,

G. B.

; Black,

M. J.

A naturalistic open source movie for optical flow evaluation. In: Computer Vision – ECCV 2012. Lecture Notes in Computer Science, Vol. 7577. Fitzgibbon,

; Lazebnik,

; Perona,

; Sato,

; Schmid,

Eds. Springer Berlin Heidelberg, 611–625, 2012.

[31]

Mayer,

; Ilg,

; Hausser,

; Fischer,

; Cremers,

; Dosovitskiy,

; Brox,

A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4040–4048, 2016.

Crossref

[32]

Geiger,

; Lenz,

; Urtasun,

Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3354–3361, 2012.

Crossref

[33]

Menze,

; Geiger,

Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3061–3070, 2015.

Crossref

[34]

Yu,

J. J.

; Harley,

A. W.

; Derpanis,

K. G.

Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In: Computer Vision – ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9915. Hua,

; Jégou,

Eds. Springer Cham, 3–10, 2016.

[35]

Revaud,

; Weinzaepfel,

; Harchaoui,

; Schmid,

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1164–1172, 2015.

[36]

Zhu,

; Newsam,

DenseNet for dense flow. In: Proceedings of the IEEE International Conference on Image Processing, 790–794, 2017.

Crossref

[37]

Ren,

; Yan,

J. C.

; Ni,

B. B.

; Liu,

; Yang,

X. K.

; Zha,

H. Y.

Unsupervised deep learning for optical flow estimation. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 31, No. 1, 1495–1501, 2017.

Crossref Google Scholar

[38]

Wang,

; Yang,

; Zhao,

; Wang,

; Xu,

Occlusion aware unsupervised learning of optical flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4884–4893, 2018.

Crossref

[39]

Ranjan,

; Romero,

; Black,

M. J.

Learning human optical flow. In: Proceedings of the British Machine Vision Conference, 297, 2018.

[40]

Ranjan,

; Hoffmann,

D. T.

; Tzionas,

; Tang,

S. Y.

; Romero,

; Black,

M. J.

Learning multi-human optical flow. International Journal of Computer Vision Vol. 128, No. 4, 873–890, 2020.

Crossref Google Scholar

[41]

Sumner,

R. W.

; Schmid,

; Pauly,

Embedded deformation for shape manipulation. ACM Transactions on Graphics Vol. 26, No. 3, 80–es, 2007.

Crossref Google Scholar

[42]

Yao,

Y. X.

; Deng,

B. L.

; Xu,

W. W.

; Zhang,

J. Y.

Quasi-Newton solver for robust non-rigid registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7597–7606, 2020.

Crossref

[43]

Yu,

; Guo,

K. W.

; Xu,

; Dong,

; Su,

Z. Q.

; Zhao,

J. H.

; Li,

J. G.

; Dai,

Q. H.

; Liu,

Y. B.

BodyFusion: Real-time capture of human motion and surface geometry using a single depth camera. In: Proceedings of the IEEE International Conference on Computer Vision, 910–919, 2017.

Crossref

[44]

Lee,

C. H.

; Liu,

Z. W.

; Wu,

L. Y.

; Luo,

MaskGAN: Towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition, 5548–5557, 2020.

Crossref

[45]

Feng,

W. Q.

; Zhang,

J. Y.

; Cai,

H. R.

; Xu,

H. F.

; Hou,

J. H.

; Bao,

H. J.

Recurrent multi-view alignment network for unsupervised surface registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10292–10302, 2021.

Crossref

[46]

Paszke,

; Gross,

; Massa,

; Lerer,

; Bradbury,

; Chanan,

; Killeen,

; Lin,

; Gimelshein,

; Antiga,

; et al. PyTorch: An imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing, 8026–8037, 2019.

[47]

Bottou,

Stochastic gradient descent tricks. In: Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, Vol. 7700. Montavon,

; Orr,

G. B.

; Müller,

K. R.

Eds. Springer Berlin Heidelberg, 421–436, 2012.

[48]

He,

K. M.

; Zhang,

X. Y.

; Ren,

S. Q.

; Sun,

Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.

[49]

Yang,

; Bulat,

; Tzimiropoulos,

FAN-face: A simple orthogonal improvement to deep face recognition. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 12621–12628, 2020.

Crossref Google Scholar

Computational Visual Media

Volume 9 Issue 1,
March 2023

Pages 109-122

DOI: 10.1007/s41095-021-0267-z

Cite this article:

Peng Z, Jiang B, Xu H, et al. Facial optical flow estimation via neural non-rigid registration. Computational Visual Media, 2023, 9(1): 109-122. https://doi.org/10.1007/s41095-021-0267-z