AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (5.7 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network

Shandong Technology and Business University, Shandong 264005, China
School of Information and Electrical Engineering, Ludong University, Yantai 264025, China
Shangdong University, Shandong 250100, China
Show Author Information

Graphical Abstract

Abstract

Recently, facial-expression recognition (FER) has primarily focused on images in the wild, including factors such as face occlusion and image blurring, rather than laboratory images. Complex field environments have introduced new challenges to FER. To address these challenges, this study proposes a cross-fusion dual-attention network. The network comprises three parts: (1) a cross-fusion grouped dual-attention mechanism to refine local features and obtain global information; (2) a proposed C2 activation function construction method, which is a piecewise cubic polynomial with three degrees of freedom, requiring less computation with improved flexibility and recognition abilities, which can better address slow running speeds and neuron inactivation problems; and (3) a closed-loop operation between the self-attention distillation process and residual connections to suppress redundant information and improve the generalization ability of the model. The recognition accuracies on the RAF-DB, FERPlus, and AffectNet datasets were 92.78%, 92.02%, and 63.58%, respectively. Experiments show that this model can provide more effective solutions for FER tasks.

References

[1]
Edwards, J.; Jackson, H. J.; Pattison, P. E. Emotion recognition via facial expression and affective prosody in schizophrenia. Clinical Psychology Review Vol. 22, No. 6, 789–832, 2002.
[2]
Joshi, A.; Kyal, S.; Banerjee, S.; Mishra, T. In-the-wild drowsiness detection from facial expressions. In: Proceedings of the IEEE Intelligent Vehicles Symposium, 207–212, 2020.
[3]
Tran, L.; Yin, X.; Liu, X. M. Representation learning by rotating your faces. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 12, 3007–3021, 2019.
[4]
Wu, T. F.; Bartlett, M. S.; Movellan, J. R. Facial expression recognition using Gabor motion energy filters. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition – Workshops, 42–47, 2010.
[5]
Shan, C. F.; Gong, S. G.; McOwan, P. W. Facial expression recognition based on Local Binary Patterns: A comprehensive study. Image and Vision Computing Vol. 27, No. 6, 803–816, 2009.
[6]
Shokoohi, Z.; Bahmanjeh, R.; Faez, K. Expression recognition using directional gradient local pattern and gradient-based ternary texture patterns. In: Proceedings of the 2nd International Conference on Pattern Recognition and Image Analysis, 1–7, 2015.
[7]
Wang, Z.; Ying, Z. L. Facial expression recognition based on local phase quantization and sparse representation. In: Proceedings of the 8th International Conference on Natural Computation, 222–225, 2012.
[8]
Ali, H. B.; Powers, D. M. W.; Jia, X. B.; Zhang, Y. H. Extended non-negative matrix factorization for face and facial expression recognition. International Journal of Machine Learning and Computing Vol. 5, No. 2, 142–147, 2015.
[9]
Baddar, W. J.; Lee, S. M.; Ro, Y. M. On-the-fly facial expression prediction using LSTM encoded appearance-suppressed dynamics. IEEE Transactions on Affective Computing Vol. 13, No. 1, 159–174, 2022.
[10]
Li, Y. J.; Gao, Y. N.; Chen, B. Z.; Zhang, Z.; Lu, G. M.; Zhang, D. Self-supervised exclusive-inclusive interactive learning for multi-label facial expression recognition in the wild. IEEE Transactions on Circuits and Systems for Video Technology Vol. 32, No. 5, 3190–3202, 2022.
[11]
Zhang, X.; Zhang, F. F.; Xu, C. S. Joint expressionsynthesis and representation learning for facial expression recognition. IEEE Transactions on Circuits and Systems for Video Technology Vol. 32, No. 3, 1681–1695, 2022.
[12]
Otberdout, N.; Daoudi, M.; Kacem, A.; Ballihi, L.; Berretti, S. Dynamic facial expression generation on Hilbert hypersphere with conditional Wasserstein generative adversarial nets. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44, No. 2, 848–863, 2022.
[13]
Zhang, F. F.; Zhang, T. Z.; Mao, Q. R.; Xu, C. S. A unified deep model for joint facial expression recognition, face synthesis, and face alignment. IEEE Transactions on Image Processing Vol. 29, 6574–6589, 2020.
[14]
Feffer, M.; Rudovic, O.; Picard, R. W. A mixture of personalized experts for human affect estimation. In: Machine Learning and Data Mining in Pattern Recognition. Lecture Notes in Computer Science, Vol. 10935. Perner, P. Ed. Springer Cham, 316–330, 2018.
[15]
Fan, Y.; Lu, X. J.; Li, D.; Liu, Y. L. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, 445–450, 2016.
[16]
Zhang, T.; Zheng, W. M.; Cui, Z.; Zong, Y.; Li, Y. Spatial–temporal recurrent neural network for emotion recognition. IEEE Transactions on Cybernetics Vol. 49, No. 3, 839–847, 2019.
[17]
Pang, L.; Li, N. Q.; Zhao, L.; Shi, W. X.; Du, Y. P. Facial expression recognition based on Gabor feature and neural network. In: Proceedings of the International Conference on Security, Pattern Analysis, and Cybernetics, 489–493, 2018.
[18]
Liu, Z.; Lin, Y. T.; Cao, Y.; Hu, H.; Wei, Y. X.; Zhang, Z.; Lin, S.; Guo, B. N. Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 9992–10002, 2021.
[19]
Kim, J. H.; Kim, N.; Won, C. S. Facial expression recognition with Swin transformer. arXiv preprint arXiv:2203.13472, 2022.
[20]
Wang, W. H.; Xie, E. Z.; Li, X.; Fan, D. P.; Song, K. T.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 548–558, 2021.
[21]
Zhang, Q.; Yang, Y. B. ResT: An efficient transformer for visual recognition. In: Proceedings of the Advances in Neural Information Processing Systems, 15475–15485, 2021.
[22]
Zhang, F.; Chen, G. G.; Wang, H.; Li, J. J.; Zhang, C. M. Multi-scale video super-resolution transformer with polynomial approximation. IEEE Transactions on Circuits and Systems for Video Technology Vol. 33, No. 9, 4496–4506, 2023.
[23]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X. H.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[24]
Aouayeb, M.; Hamidouche, W.; Soladie, C.; Kpalma, K.; Seguier, R. Learning vision transformer with squeeze and excitation for facial expression recognition. arXiv preprint arXiv:2107.03107, 2021.
[25]
Putro, M. D.; Nguyen, D. L.; Jo, K. H. A dual attention module for real-time facial expression recognition. In: Proceedings of the 46th Annual Conference of the IEEE Industrial Electronics Society, 411–416, 2020.
[26]
Song, W. Y.; Shi, S. Z.; Wu, Y. X.; An, G. Y. Dual-attention guided network for facial action unit detection. IET Image Processing Vol. 16, No. 8, 2157–2170, 2022.
[27]
Ding, M. Y.; Xiao, B.; Codella, N.; Luo, P.; Wang, J. D.; Yuan, L. DaViT: Dual attention vision transformers. In: Computer Vision – ECCV 2022. Lecture Notes in Computer Science, Vol. 13684. Avidan, S.; Brostow, G.; Cissé, M.; Farinella, G. M.; Hassner, T. Eds. Springer Cham, 74–92, 2022.
[28]
Fu, J.; Liu, J.; Tian, H. J.; Li, Y.; Bao, Y. J.; Fang, Z. W.; Lu, H. Q. Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3141–3149, 2019.
[29]
Li, X. Q.; Xie, M.; Zhang, Y.; Ding, G. T.; Tong, W. Q. Dual attention convolutional network for action recognition. IET Image Processing Vol. 14, No. 6, 1059–1065, 2020.
[30]
Li, Y. S.; Liu, Y.; Yu, R.; Zong, H. L.; Xie, W. X. Dual attention based spatial-temporal inference network for volleyball group activity recognition. Multimedia Tools and Applications Vol. 82, No. 10, 15515–15533, 2023.
[31]
Gedamu, K.; Yilma, G.; Assefa, M.; Ayalew, M. Spatio-temporal dual-attention network for view-invariant human action recognition. In: Proceedings of the SPIE 12342, 14th International Conference on Digital Image Processing, 123420Q, 2022.
[32]
Ullah, H.; Munir, A. Human activity recognition using cascaded dual attention CNN and bi-directional GRU framework. arXiv preprint arXiv:2208.05034, 2022.
[33]
Zheng, C.; Mendieta, M.; Chen, C. POSTER: A pyramid cross-fusion transformer network for facial expression recognition. arXiv preprint arXiv:2204. 04083, 2022.
[34]
Han, J.; Moraga, C. The influence of the sigmoid function parameters on the speed of backpropagation learning. In: Proceedings of the International Workshop on Artificial Neural Networks: From Natural to Artificial Neural Computation, 195–201, 1995.
[35]
Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, 315–323, 2011.
[36]
Bourel, F.; Chibelushi, C. C.; Low, A. A. Recognition of facial expressions in the presence of occlusion. In: Proceedings of the British Machine Vision Conference, 1–10, 2001.
[37]
Mao, X.; Xue, Y. L.; Li, Z.; Huang, K.; Lv, S. W. Robust facial expression recognition based on RPCA and AdaBoost. In: Proceedings of the 10th Workshop on Image Analysis for Multimedia Interactive Services, 113–116, 2009.
[38]
Jiang, B.; Jia, K. B. Research of robust facial expression recognition under facial occlusion condition. In: Proceedings of the 7th International Conference on Active Media Technology, 92–100, 2011.
[39]
Hammal, Z.; Arguin, M.; Gosselin, F. Comparing a novel model based on the transferable belief model with humans during the recognition of partially occluded facial expressions. Journal of Vision Vol. 9, No. 2, 22, 2009.
[40]
Zhang, K. P.; Zhang, Z. P.; Li, Z. F.; Qiao, Y. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters Vol. 23, No. 10, 1499–1503, 2016.
[41]
Amos, B.; Ludwiczuk, B.; Satyanarayanan, M. OpenFace: A general-purpose face recognition librarywith mobile applications. School of Computer Science, Carnegie Mellon University, 2016. Available at https://elijah.cs.cmu.edu/DOCS/CMU-CS-16-118.pdf
[42]
Happy, S. L.; Routray, A. Automatic facial expression recognition using features of salient facial patches. IEEE Transactions on Affective Computing Vol. 6, No. 1, 1–12, 2015.
[43]
Majumder, A.; Behera, L.; Subramanian, V. K. Automatic facial expression recognition system using deep network-based data fusion. IEEE Transactions on Cybernetics Vol. 48, No. 1, 103–114, 2018.
[44]
Wang, K.; Peng, X. J.; Yang, J. F.; Lu, S. J.; Qiao, Y. Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6896–6905, 2020.
[45]
Wang, K.; Peng, X. J.; Yang, J. F.; Meng, D. B.; Qiao, Y. Region attention networks for pose and occlusionrobust facial expression recognition. IEEE Transactions on Image Processing Vol. 29, 4057–4069, 2020.
[46]
Zhao, Z. Q.; Liu, Q. S.; Zhou, F. Robust light-weight facial expression recognition network with label distribution training. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 4, 3510–3519, 2021.
[47]
She, J. H.; Hu, Y. B.; Shi, H. L.; Wang, J.; Shen, Q.; Mei, T. Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6244–6253, 2021.
[48]
Ruan, D. L.; Yan, Y.; Lai, S. Q.; Chai, Z. H.; Shen, C. H.; Wang, H. Z. Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7656–7665, 2021.
[49]
Wen, Z.; Lin, W.; Wang, T.; Xu, G. Distract your attention: Multi-head cross attention network for facial expression recognition. arXiv preprint arXiv:2109.07270, 2021.
[50]
Jiang, S. P.; Xu, X. M.; Liu, F.; Xing, X. F.; Wang, L. CS-GResNet: A simple and highly efficient network for facial expression recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2599–2603, 2022.
[51]
Chen, M.; Radford, A.; Child, R.; Wu, J.; Jun, H.; Luan, D.; Sutskever, I. Generative pretraining from pixels. In: Proceedings of the 37th International Conference on Machine Learning, 1691–1703, 2020.
[52]
Ma, F.; Sun, B.; Li, S. Robust facial expression recognition with convolutional visual transformers. arXiv preprint arXiv:2103.16854, 2021.
[53]
Li, H.; Sui, M.; Zhao, F.; Zha, Z.; Wu, F. MVT: Mask vision transformer for facial expression recognition in the wild. arXiv preprint arXiv:2106.04520, 2021.
[54]
Huang, Q. H.; Huang, C. Q.; Wang, X. Z.; Jiang, F. Facial expression recognition with grid-wise attention and visual transformer. Information Sciences Vol. 580, 35–54, 2021.
[55]
Xue, F. L.; Wang, Q. C.; Guo, G. D. TransFER: Learning relation-aware facial expression repre-sentations with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 3581–3590, 2021.
[56]
Shi, J.; Zhu, S.; Liang, Z. Learning to amend facial expression representation via de-albino and affinity. arXiv preprint arXiv:2103.10189, 2021.
[57]
Liu, H. W.; Cai, H. L.; Lin, Q. C.; Li, X. F.; Xiao,H. Adaptive multilayer perceptual attention network for facial expression recognition. IEEE Transactions on Circuits and Systems for Video Technology Vol. 32, No. 9, 6253–6266, 2022.
[58]
Dhall, A.; Goecke, R.; Lucey, S.; Gedeon, T. Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 2106–2112, 2011.
[59]
Li, S.; Deng, W. H.; Du, J. P. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2584–2593, 2017.
[60]
Barsoum, E.; Zhang, C.; Ferrer, C. C.; Zhang, Z. Y. Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, 279–283, 2016.
[61]
Van Der Maaten, L.; Hinton, G. Visualizing data using t-SNE. Journal of Machine Learning Research Vol. 9, 2579–2625, 2008.
Computational Visual Media
Pages 593-608
Cite this article:
Zhang F, Chen G, Wang H, et al. CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network. Computational Visual Media, 2024, 10(3): 593-608. https://doi.org/10.1007/s41095-023-0369-x

276

Views

15

Downloads

2

Crossref

2

Web of Science

3

Scopus

0

CSCD

Altmetrics

Received: 21 February 2023
Accepted: 19 July 2023
Published: 08 February 2024
© The Author(s) 2024.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return