CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network

Fan Zhang; Gongguan Chen; Hua Wang; Caiming Zhang

doi:10.1007/s41095-023-0369-x

| Sign up

PDF (5.7 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Research Article | Open Access

CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network

Fan Zhang^¹, Gongguan Chen^¹, Hua Wang^²(), Caiming Zhang^³

1Shandong Technology and Business University, Shandong 264005, China

2School of Information and Electrical Engineering, Ludong University, Yantai 264025, China

3Shangdong University, Shandong 250100, China

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

Recently, facial-expression recognition (FER) has primarily focused on images in the wild, including factors such as face occlusion and image blurring, rather than laboratory images. Complex field environments have introduced new challenges to FER. To address these challenges, this study proposes a cross-fusion dual-attention network. The network comprises three parts: (1) a cross-fusion grouped dual-attention mechanism to refine local features and obtain global information; (2) a proposed $C^{2}$ activation function construction method, which is a piecewise cubic polynomial with three degrees of freedom, requiring less computation with improved flexibility and recognition abilities, which can better address slow running speeds and neuron inactivation problems; and (3) a closed-loop operation between the self-attention distillation process and residual connections to suppress redundant information and improve the generalization ability of the model. The recognition accuracies on the RAF-DB, FERPlus, and AffectNet datasets were 92.78%, 92.02%, and 63.58%, respectively. Experiments show that this model can provide more effective solutions for FER tasks.

Keywords

facial-expression recognition (FER)cubic polynomial activation function dual-attention mechanism interactive learning self-attention distillation

References

[1]

Edwards,

; Jackson,

H. J.

; Pattison,

P. E.

Emotion recognition via facial expression and affective prosody in schizophrenia. Clinical Psychology Review Vol. 22, No. 6, 789–832, 2002.

Crossref Google Scholar

[2]

Joshi,

; Kyal,

; Banerjee,

; Mishra,

In-the-wild drowsiness detection from facial expressions. In: Proceedings of the IEEE Intelligent Vehicles Symposium, 207–212, 2020.

Crossref

[3]

Tran,

; Yin,

; Liu,

X. M.

Representation learning by rotating your faces. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 12, 3007–3021, 2019.

Crossref Google Scholar

[4]

Wu,

T. F.

; Bartlett,

M. S.

; Movellan,

J. R.

Facial expression recognition using Gabor motion energy filters. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition – Workshops, 42–47, 2010.

Crossref

[5]

Shan,

C. F.

; Gong,

S. G.

; McOwan,

P. W.

Facial expression recognition based on Local Binary Patterns: A comprehensive study. Image and Vision Computing Vol. 27, No. 6, 803–816, 2009.

Crossref Google Scholar

[6]

Shokoohi,

; Bahmanjeh,

; Faez,

Expression recognition using directional gradient local pattern and gradient-based ternary texture patterns. In: Proceedings of the 2nd International Conference on Pattern Recognition and Image Analysis, 1–7, 2015.

Crossref

[7]

Wang,

; Ying,

Z. L.

Facial expression recognition based on local phase quantization and sparse representation. In: Proceedings of the 8th International Conference on Natural Computation, 222–225, 2012.

Crossref

[8]

Ali,

H. B.

; Powers,

D. M. W.

; Jia,

X. B.

; Zhang,

Y. H.

Extended non-negative matrix factorization for face and facial expression recognition. International Journal of Machine Learning and Computing Vol. 5, No. 2, 142–147, 2015.

Crossref Google Scholar

[9]

Baddar,

W. J.

; Lee,

S. M.

; Ro,

Y. M.

On-the-fly facial expression prediction using LSTM encoded appearance-suppressed dynamics. IEEE Transactions on Affective Computing Vol. 13, No. 1, 159–174, 2022.

Crossref Google Scholar

[10]

Li,

Y. J.

; Gao,

Y. N.

; Chen,

B. Z.

; Zhang,

; Lu,

G. M.

; Zhang,

Self-supervised exclusive-inclusive interactive learning for multi-label facial expression recognition in the wild. IEEE Transactions on Circuits and Systems for Video Technology Vol. 32, No. 5, 3190–3202, 2022.

Crossref Google Scholar

[11]

Zhang,

; Zhang,

F. F.

; Xu,

C. S.

Joint expressionsynthesis and representation learning for facial expression recognition. IEEE Transactions on Circuits and Systems for Video Technology Vol. 32, No. 3, 1681–1695, 2022.

Crossref Google Scholar

[12]

Otberdout,

; Daoudi,

; Kacem,

; Ballihi,

; Berretti,

Dynamic facial expression generation on Hilbert hypersphere with conditional Wasserstein generative adversarial nets. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44, No. 2, 848–863, 2022.

Crossref Google Scholar

[13]

Zhang,

F. F.

; Zhang,

T. Z.

; Mao,

Q. R.

; Xu,

C. S.

A unified deep model for joint facial expression recognition, face synthesis, and face alignment. IEEE Transactions on Image Processing Vol. 29, 6574–6589, 2020.

Crossref Google Scholar

[14]

Feffer,

; Rudovic,

; Picard,

R. W.

A mixture of personalized experts for human affect estimation. In: Machine Learning and Data Mining in Pattern Recognition. Lecture Notes in Computer Science, Vol. 10935. Perner,

Ed. Springer Cham, 316–330, 2018.

Crossref

[15]

Fan,

; Lu,

X. J.

; Li,

; Liu,

Y. L.

Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, 445–450, 2016.

Crossref

[16]

Zhang,

; Zheng,

W. M.

; Cui,

; Zong,

; Li,

Spatial–temporal recurrent neural network for emotion recognition. IEEE Transactions on Cybernetics Vol. 49, No. 3, 839–847, 2019.

Crossref Google Scholar

[17]

Pang,

; Li,

N. Q.

; Zhao,

; Shi,

W. X.

; Du,

Y. P.

Facial expression recognition based on Gabor feature and neural network. In: Proceedings of the International Conference on Security, Pattern Analysis, and Cybernetics, 489–493, 2018.

Crossref

[18]

Liu,

; Lin,

Y. T.

; Cao,

; Hu,

; Wei,

Y. X.

; Zhang,

; Lin,

; Guo,

B. N.

Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 9992–10002, 2021.

Crossref

[19]

Kim,

J. H.

; Kim,

; Won,

C. S.

Facial expression recognition with Swin transformer. arXiv preprint arXiv:2203.13472, 2022.

Google Scholar

[20]

Wang,

W. H.

; Xie,

E. Z.

; Li,

; Fan,

D. P.

; Song,

K. T.

; Liang,

; Lu,

; Luo,

; Shao,

Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 548–558, 2021.

Crossref

[21]

Zhang,

; Yang,

Y. B.

ResT: An efficient transformer for visual recognition. In: Proceedings of the Advances in Neural Information Processing Systems, 15475–15485, 2021.

[22]

Zhang,

; Chen,

G. G.

; Wang,

; Li,

J. J.

; Zhang,

C. M.

Multi-scale video super-resolution transformer with polynomial approximation. IEEE Transactions on Circuits and Systems for Video Technology Vol. 33, No. 9, 4496–4506, 2023.

Crossref Google Scholar

[23]

Dosovitskiy,

; Beyer,

; Kolesnikov,

; Weissenborn,

; Zhai,

X. H.

; Unterthiner,

; Dehghani,

; Minderer,

; Heigold,

; Gelly,

; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

Google Scholar

[24]

Aouayeb,

; Hamidouche,

; Soladie,

; Kpalma,

; Seguier,

Learning vision transformer with squeeze and excitation for facial expression recognition. arXiv preprint arXiv:2107.03107, 2021.

Google Scholar

[25]

Putro,

M. D.

; Nguyen,

D. L.

; Jo,

K. H.

A dual attention module for real-time facial expression recognition. In: Proceedings of the 46th Annual Conference of the IEEE Industrial Electronics Society, 411–416, 2020.

Crossref

[26]

Song,

W. Y.

; Shi,

S. Z.

; Wu,

Y. X.

; An,

G. Y.

Dual-attention guided network for facial action unit detection. IET Image Processing Vol. 16, No. 8, 2157–2170, 2022.

Crossref Google Scholar

[27]

Ding,

M. Y.

; Xiao,

; Codella,

; Luo,

; Wang,

J. D.

; Yuan,

DaViT: Dual attention vision transformers. In: Computer Vision – ECCV 2022. Lecture Notes in Computer Science, Vol. 13684. Avidan,

; Brostow,

; Cissé,

; Farinella,

G. M.

; Hassner,

Eds. Springer Cham, 74–92, 2022.

Crossref

[28]

Fu,

; Liu,

; Tian,

H. J.

; Li,

; Bao,

Y. J.

; Fang,

Z. W.

; Lu,

H. Q.

Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3141–3149, 2019.

Crossref

[29]

Li,

X. Q.

; Xie,

; Zhang,

; Ding,

G. T.

; Tong,

W. Q.

Dual attention convolutional network for action recognition. IET Image Processing Vol. 14, No. 6, 1059–1065, 2020.

Crossref Google Scholar

[30]

Li,

Y. S.

; Liu,

; Yu,

; Zong,

H. L.

; Xie,

W. X.

Dual attention based spatial-temporal inference network for volleyball group activity recognition. Multimedia Tools and Applications Vol. 82, No. 10, 15515–15533, 2023.

Crossref Google Scholar

[31]

Gedamu,

; Yilma,

; Assefa,

; Ayalew,

Spatio-temporal dual-attention network for view-invariant human action recognition. In: Proceedings of the SPIE 12342, 14th International Conference on Digital Image Processing, 123420Q, 2022.

[32]

Ullah,

; Munir,

Human activity recognition using cascaded dual attention CNN and bi-directional GRU framework. arXiv preprint arXiv:2208.05034, 2022.

Crossref Google Scholar

[33]

Zheng,

; Mendieta,

; Chen,

POSTER: A pyramid cross-fusion transformer network for facial expression recognition. arXiv preprint arXiv:2204. 04083, 2022.

Crossref Google Scholar

[34]

Han,

; Moraga,

The influence of the sigmoid function parameters on the speed of backpropagation learning. In: Proceedings of the International Workshop on Artificial Neural Networks: From Natural to Artificial Neural Computation, 195–201, 1995.

Crossref

[35]

Glorot,

; Bordes,

; Bengio,

Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, 315–323, 2011.

[36]

Bourel,

; Chibelushi,

C. C.

; Low,

A. A.

Recognition of facial expressions in the presence of occlusion. In: Proceedings of the British Machine Vision Conference, 1–10, 2001.

Crossref

[37]

Mao,

; Xue,

Y. L.

; Li,

; Huang,

; Lv,

S. W.

Robust facial expression recognition based on RPCA and AdaBoost. In: Proceedings of the 10th Workshop on Image Analysis for Multimedia Interactive Services, 113–116, 2009.

Crossref

[38]

Jiang,

; Jia,

K. B.

Research of robust facial expression recognition under facial occlusion condition. In: Proceedings of the 7th International Conference on Active Media Technology, 92–100, 2011.

Crossref

[39]

Hammal,

; Arguin,

; Gosselin,

Comparing a novel model based on the transferable belief model with humans during the recognition of partially occluded facial expressions. Journal of Vision Vol. 9, No. 2, 22, 2009.

Crossref Google Scholar

[40]

Zhang,

K. P.

; Zhang,

Z. P.

; Li,

Z. F.

; Qiao,

Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters Vol. 23, No. 10, 1499–1503, 2016.

Crossref Google Scholar

[41]

Amos,

; Ludwiczuk,

; Satyanarayanan,

OpenFace: A general-purpose face recognition librarywith mobile applications. School of Computer Science, Carnegie Mellon University, 2016. Available at https://elijah.cs.cmu.edu/DOCS/CMU-CS-16-118.pdf

[42]

Happy,

S. L.

; Routray,

Automatic facial expression recognition using features of salient facial patches. IEEE Transactions on Affective Computing Vol. 6, No. 1, 1–12, 2015.

Crossref Google Scholar

[43]

Majumder,

; Behera,

; Subramanian,

V. K.

Automatic facial expression recognition system using deep network-based data fusion. IEEE Transactions on Cybernetics Vol. 48, No. 1, 103–114, 2018.

Crossref Google Scholar

[44]

Wang,

; Peng,

X. J.

; Yang,

J. F.

; Lu,

S. J.

; Qiao,

Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6896–6905, 2020.

Crossref

[45]

Wang,

; Peng,

X. J.

; Yang,

J. F.

; Meng,

D. B.

; Qiao,

Region attention networks for pose and occlusionrobust facial expression recognition. IEEE Transactions on Image Processing Vol. 29, 4057–4069, 2020.

Crossref Google Scholar

[46]

Zhao,

Z. Q.

; Liu,

Q. S.

; Zhou,

Robust light-weight facial expression recognition network with label distribution training. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 4, 3510–3519, 2021.

Crossref Google Scholar

[47]

She,

J. H.

; Hu,

Y. B.

; Shi,

H. L.

; Wang,

; Shen,

; Mei,

Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6244–6253, 2021.

Crossref

[48]

Ruan,

D. L.

; Yan,

; Lai,

S. Q.

; Chai,

Z. H.

; Shen,

C. H.

; Wang,

H. Z.

Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7656–7665, 2021.

Crossref

[49]

Wen,

; Lin,

; Wang,

; Xu,

Distract your attention: Multi-head cross attention network for facial expression recognition. arXiv preprint arXiv:2109.07270, 2021.

Google Scholar

[50]

Jiang,

S. P.

; Xu,

X. M.

; Liu,

; Xing,

X. F.

; Wang,

CS-GResNet: A simple and highly efficient network for facial expression recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2599–2603, 2022.

Crossref

[51]

Chen,

; Radford,

; Child,

; Wu,

; Jun,

; Luan,

; Sutskever,

Generative pretraining from pixels. In: Proceedings of the 37th International Conference on Machine Learning, 1691–1703, 2020.

[52]

Ma,

; Sun,

; Li,

Robust facial expression recognition with convolutional visual transformers. arXiv preprint arXiv:2103.16854, 2021.

Google Scholar

[53]

Li,

; Sui,

; Zhao,

; Zha,

; Wu,

MVT: Mask vision transformer for facial expression recognition in the wild. arXiv preprint arXiv:2106.04520, 2021.

Google Scholar

[54]

Huang,

Q. H.

; Huang,

C. Q.

; Wang,

X. Z.

; Jiang,

Facial expression recognition with grid-wise attention and visual transformer. Information Sciences Vol. 580, 35–54, 2021.

Crossref Google Scholar

[55]

Xue,

F. L.

; Wang,

Q. C.

; Guo,

G. D.

TransFER: Learning relation-aware facial expression repre-sentations with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 3581–3590, 2021.

Crossref

[56]

Shi,

; Zhu,

; Liang,

Learning to amend facial expression representation via de-albino and affinity. arXiv preprint arXiv:2103.10189, 2021.

Crossref Google Scholar

[57]

Liu,

H. W.

; Cai,

H. L.

; Lin,

Q. C.

; Li,

X. F.

; Xiao,

Adaptive multilayer perceptual attention network for facial expression recognition. IEEE Transactions on Circuits and Systems for Video Technology Vol. 32, No. 9, 6253–6266, 2022.

Crossref Google Scholar

[58]

Dhall,

; Goecke,

; Lucey,

; Gedeon,

Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 2106–2112, 2011.

Crossref

[59]

Li,

; Deng,

W. H.

; Du,

J. P.

Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2584–2593, 2017.

Crossref

[60]

Barsoum,

; Zhang,

; Ferrer,

C. C.

; Zhang,

Z. Y.

Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, 279–283, 2016.

Crossref

[61]

Van Der Maaten,

; Hinton,

Visualizing data using t-SNE. Journal of Machine Learning Research Vol. 9, 2579–2625, 2008.

Google Scholar

Computational Visual Media

Volume 10 Issue 3,
June 2024

Pages 593-608

DOI: 10.1007/s41095-023-0369-x

Cite this article:

Zhang F, Chen G, Wang H, et al. CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network. Computational Visual Media, 2024, 10(3): 593-608. https://doi.org/10.1007/s41095-023-0369-x