PDF (9 MB)
Collect
Submit Manuscript
Show Outline
Outline
Abstract
References
Show full outline
Hide outline
Review Article | Open Access

A Survey on 3D Skeleton-Based Action Recognition Using Learning Method

Bin Ren1,2Mengyuan Liu3()Runwei Ding4Hong Liu3
University of Pisa, Pisa, Italy
University of Trento, Trento, Italy
National Key Laboratory of General Artificial Intelligence, Peking University, Shenzhen Graduate School, Shenzhen, China
Peng Cheng Laboratory, Shenzhen, China
Show Author Information

Abstract

Three-dimensional skeleton-based action recognition (3D SAR) has gained important attention within the computer vision community, owing to the inherent advantages offered by skeleton data. As a result, a plethora of impressive works, including those based on conventional handcrafted features and learned feature extraction methods, have been conducted over the years. However, prior surveys on action recognition have primarily focused on video or red-green-blue (RGB) data-dominated approaches, with limited coverage of reviews related to skeleton data. Furthermore, despite the extensive application of deep learning methods in this field, there has been a notable absence of research that provides an introductory or comprehensive review from the perspective of deep learning architectures. To address these limitations, this survey first underscores the importance of action recognition and emphasizes the significance of 3-dimensional (3D) skeleton data as a valuable modality. Subsequently, we provide a comprehensive introduction to mainstream action recognition techniques based on 4 fundamental deep architectures, i.e., recurrent neural networks, convolutional neural networks, graph convolutional network, and Transformers. All methods with the corresponding architectures are then presented in a data-driven manner with detailed discussion. Finally, we offer insights into the current largest 3D skeleton dataset, NTU-RGB+D, and its new edition, NTU-RGB+D 120, along with an overview of several top-performing algorithms on these datasets. To the best of our knowledge, this research represents the first comprehensive discussion of deep learning-based action recognition using 3D skeleton data.

References

1

Wang Y, Kang H, Wu D, Yang W, Zhang L. Global and local spatio-temporal encoder for 3D human pose estimation. IEEE Trans Multimedia. 2023;1–11.

2

Tu Z, Liu Y, Zhang Y, Mu Q, Yuan J. Joint optimization of dark enhancement and action recognition in videos. IEEE Trans Image Process. 2023;32:3507–3520.

3

Zhang Y, Xu X, Zhao Y, Wen Y, Tang Z, Liu M. Facial prior guided micro-expression generation. IEEE Trans Image Process. 2024;33:525–540.

4

Wang X, Zhang W, Wang C, Gao Y, Liu M. Dynamic dense graph convolutional network for skeleton-based human motion prediction. IEEE Trans Image Process. 2024;33:1–15.

5
Liu H, Tian L, Liu M, Tang H. Sdm-bsm: A fusing depth scheme for human action recognition. Paper presented at: IEEE International Conference on Image Processing (ICIP); 2015 Sep 27–30; Quebec City, QC, Canada.
6
Liu M, He Q, Liu H. Fusing shape and motion matrices for view invariant action recognition using 3D skeletons. Paper presented at: IEEE International Conference on Image Processing (ICIP); 2017 Sep 17–20; Beijing, China.
7

Zhang FL, Cheng MM, Jia J, Hu SM. Imageadmixture: Putting together dissimilar objects from groups. IEEE Trans Vis Comput Graph. 2012;18(11):1849–1857.

8

Zhang FL, Wu X, Li RL, Wang J, Zheng ZH, Hu SM. Detecting and removing visual distractors for video aesthetic enhancement. IEEE Trans Multimedia. 2018;20(8):1987–1999.

9
Chen C, Liu M, Meng X, Xiao W, Ju Q. Refinedetlite: A lightweight one-stage object detection framework for cpu-only devices. Paper presented at: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops; 2020 Jun 14–19; Seattle, WA.
10

Ren Z, Meng J, Yuan J, Zhang Z. Robust hand gesture recognition with kinect sensor. IEEE Trans Image Process. 2013;15(5):1110–1120.

11
Liu M, Meng F, Chen C, Wu S. Novel motion patterns matter for practical skeleton-based action recognition. In: AAAI Conference on Artificial Intelligence (AAAI); 2023 Feb 7, p.1701–1709.
12

Ren B, Tang H, Meng F, Ding R, Torr PH, Sebe N. Cloth interactive transformer for virtual try-on. ACM Trans Multimed Comput Commun Appl. 2023;20(4):1–20.

13

Liu M, Liu H, Chen C. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit. 2017;68:346–362.

14
Yang F, Wu Y, Sakti S, Nakamura S. Make skeleton-based action recognition model smaller, faster and better. Paper presented at: Proceedings of the ACM multimedia asia; 2019 Dec 15–18; Beijing, China.
15

Liu H, Tu J, Liu M. Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv. 2017. https://doi.org/10.48550/arXiv.1705.08106

16

Tang H, Ding L, Wu S, Ren B, Sebe N, Rota P. Deep unsupervised key frame extraction for efficient video classification. ACM Trans Multimed Comput Commun Appl. 2023;19(3):1–17.

17
Theodoridis T, Hu H. Action classification of 3d human models using dynamic anns for mobile robot surveillance. Paper presented at: 2007 IEEE International Conference on Robotics and Biomimetics (ROBIO); 2004 Dec 15–18, Sanya China.
18

Zhao M, Liu M, Ren B, Dai S, Sebe N. Modiff: Action-conditioned 3d motion generation with denoising diffusion probabilistic models. arXiv. 2023. https://doi.org/10.48550/arXiv.2301.03949

19

Wang Y, Tian Y, Zhu J, She H, Jiang Y, Jiang Z, Yokoi H. A hand gesture recognition strategy based on virtual dimension increase of EMG. Cyborg Bionic Syst. 2023;5:Article 0066.

20

Lin J, Gan C, Han S. Temporal shift module for efficient video understanding. arXiv. 2019. https://doi.org/10.48550/arXiv.1811.08383

21
Feichtenhofer C, Fan H, Malik J, He K. Slowfast networks for video recognition. Paper presented at: Proceedings of the IEEE/CVF international conference on computer vision. 2019; Oct–Nov 27–02; Seoul South Korea.
22
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M. A closer look at spatiotemporal convolutions for action recognition. Paper presented at: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition; 2018 Jun 18–23; Salt Lake City, UT.
23
Liu H, Ren B, Liu M, Ding R. Grouped temporal enhancement module for human action recognition. In: 2020 IEEE International Conference on Image Processing (ICIP); 2020 Oct 25–28; Abu Dhabi, UAE.
24
Thatipelli A, Narayan S, Khan S, Anwer RM, Khan FS, Ghanem B. Spatio-temporal relation modeling for few-shot action recognition. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022 Jun 18–24; New Orleans, LA.
25

Xu C, Govindarajan LN, Zhang Y, Cheng L. Lie-x: Depth image based articulated object pose estimation, tracking, and action recognition on lie groups. Int J Comput Vis. 2017;123:454–478.

26

Baek S, Shi Z, Kawade M, Kim TK. Kinematic-layout-aware random forests for depth-based action recognition. arXiv. 2016. https://doi.org/10.48550/arXiv.1607.06972

27
Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. Paper presented at: NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems; 2014 Dec 8; p. 568–576.
28
Feichtenhofer C, Pinz A, Zisserman A. Convolutional two-stream network fusion for video action recognition. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016 Jun 27–30; Las Vegas, NV.
29
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, van Gool L Temporal segment networks: Towards good practices for deep action recognition. In: European conference on computer vision. (Springer. 2016). p. 20–36.
30
Gu Y, Sheng W, Ou Y, Liu M, Zhang S. Human action recognition with contextual constraints using a RGB-D sensor. Paper presented at: 2013 IEEE International Conference on Robotics and Biomimetics (ROBIO). 2013 Dec 12–14; Shenzhen, China.
31

Hu J-F, Zheng WS, Lai J, Zhang J. Jointly learning heterogeneous features for RGB-D activity recognition. IEEE Trans Pattern Anal Mach Intell. 2015;(11):5344–5352.

32

Johansson G. Visual perception of biological motion and a model for its analysis. Percept psychophys. 1973;14:201–211.

33
Liu C, Zhao M, Ren B, Liu M, Sebe N. Spatio-Temporal Graph Diffusion for Text-Driven Human Motion Generation. In: British Machine Vision Conference. 2023.
34

Zhang Z. Microsoft kinect sensor and its effect. IEEE Multimedia. 2012;19(2):4–10.

35
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X. Multi-context attention for human pose estimation. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017 Jul 21–26; Honolulu, HI.
36
Yang W, Ouyang W, Li H, Wang X. End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016 Jun 27–30; Las Vegas, NV.
37

Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv 2019. https://doi.org/10.48550/arXiv.1812.08008

38
Zhao Q, Zheng C, Liu M, Chen C. A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation. In: Thirty-seventh Conference on Neural Information Processing Systems. 2023.
39
Si C, Chen W, Wang W, Wang L, Tan T. An attention enhanced graph convolutional lstm network for skeleton-based action recognition. Paper presented at: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2019 Jun 15–20; Long Beach, CA.
40
Vemulapalli R, Arrate F, Chellappa R. Human action recognition by representing 3d skeletons as points in a lie group. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition; 2014 Jun 23–28; Columbus, OH.
41
Hussein ME, Torki M, Gowayyed MA, El-Saban M. Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In: Twenty-third international joint conference on artificial intelligence. 2013.
42
Zhou Q, Yu S, Wu X, Gao Q, Li C, Xu Y. Hmms-based human action recognition for an intelligent household surveillance robot. Paper presented at: 2009 IEEE International Conference on Robotics and Biomimetics (ROBIO); 2009 Dec 19–23; Guilin, China.
43
Wang T, Liu H, Ding R, Li W, You Y, Li X. Interweaved Graph and Attention Network for 3D Human Pose Estimation. Paper presented at: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2023 Jun 4–10; Rhodes Island, Greece.
44
You Y, Liu H, Wang T, Li W, Ding R, Li X. Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video. Paper presented at: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2023 Oct 1–6; Paris, France.
45
Vemulapalli R, Chellapa R. Rolling rotations for recognizing human actions from 3d skeletal data. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016 Jun 37–30; Las Vegas, NV.
46

Wang L, Huynh DQ, Koniusz P. A comparative review of recent kinect-based action recognition algorithms. IEEE Trans Image Process. 2019;29:15–28.

47

Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Commun ACM. 2012;60(6):84–90.

48

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv. 2020. https://doi.org/10.48550/arXiv.2010.11929

49

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to-end object detection with transformers. arXiv. 2020. https://doi.org/10.48550/arXiv.2005.12872

50

Zhu X, Su W, Lu L, Li B, Wang X, Dai J. Deformable DETR: Deformable Transformers for end-to-end object detection. arXiv. 2021. https://doi.org/10.48550/arXiv.2010.04159

51
Lev G, Sadeh G, Klein B, Wolf L. Rnn fisher vectors for action recognition and image annotation. Paper presented at: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016; 2016 Oct 8–16; Amsterdam Netherlands.
52
Cheron G, Laptev I, and Schmid C. P-cnn: Pose-based cnn features for action recognition. In: Proceedings of the IEEE international conference on computer vision. 2015:3218–26.
53
Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recognition. Paper presented at: Proceedings of the AAAI conference on artificial intelligence. 2018.
54
Si C, Jing Y, Wang W, Wang L, Tan T. Skeleton-based action recognition with spatial reasoning and temporal stack learning. Paper presented at: Proceedings of the European conference on computer vision (ECCV). 2018
55
Wang L, Koniusz P. 3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023 Jun 17–24; Vancouver, BC, Canada.
56

Zhou Y, Cheng ZQ, Li C, Fan Y, Geng Y, Xie X, Keuper M. Hypergraph transformer for skeleton-based action recognition. arXiv. 2023. https://doi.org/10.48550/arXiv.2211.09590

57

Plizzari C, Cannici M, Matteucci M. Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vis Image Underst. 2021;208–209:Article 103219.

58
Zhu X, Huang PY, Liang J, Melo CM de, Hauptmann AG. Stmt: A spatial-temporal mesh transformer for mocap-based action recognition. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023 Jun 17–24; Vancouver, BC, Canada.
59

Bai D, Liu T, Han X, Yi H. Application research on optimization algorithm of sEMG gesture recognition based on light CNN+ LSTM model. Cyborg Bionic Syst. 2021;2021:Article 9794610.

60
You Y, Liu H, Li X, Li W, Wang T, Ding R. Gator: Graph-Aware Transformer with Motion-Disentangled Regression for Human Mesh Recovery from a 2D Pose. Paper presented at: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2023 Jun 4–10; Rhodes Island, Greece.
61

Poppe R. A survey on vision-based human action recognition. Image Vis Comput. 2010;28(6):976–990.

62

Weinland D, Ronfard R, Boyer E. A survey of vision-based methods for action representation segmentation and recognition. Comput Vis Image Underst. 2011;115(2):224–241.

63
Wu Z, Yao T, Fu Y, Jiang YG. Deep learning for video classification and captioning. Paper presented at: Frontiers of multimedia research; 2017 Dec 19; p. 3–29.
64

Herath S, Harandi M, Porikli F. Going deeper into action recognition: A survey. Image Vis Comput. 2017;60:4–21.

65

Lo Presti L, La Cascia M. 3D skeleton-based human action classification: A survey. Pattern Recognit. 53:130–147.

66

Ellis C, Masood SZ, Tappen MF, Laviola JJ Jr, Sukthankar R. Exploring the trade-off between accuracy and observational latency in;action recognition. Int J Comput Vis. 2013;101:420–436.

67
Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R. Berkeley MHAD: A comprehensive Multimodal Human Action Database. In: Applications of eComputer Vision. 2013.
68
Wang J, Liu Z, Wu Y, Yuan J. Mining Actionlet Ensemble for Action Recognition with Depth Cameras. In: Computer Vision and Pattern Recognition; 2012 Jun 16–21; Providence, RI.
69

Liu M, Meng F, Liang Y. Generalized pose decoupled network for unsupervised 3d skeleton sequence-based action representation learning. Cyborg Bionic Syst. 2022;2022:0002.

70

Sun Z, Ke Q, Rahmani H, Bennamoun M, Wang G, Liu J. Human action recognition from various data modalities: A review. IEEE Trans Pattern Anal Mach Intell. 2022;45(3):3200–3225.

71
Zhang P, Xue J, Lan C, Zeng W, Gao Z, Zheng N. Adding attentiveness to the neurons in recurrent neural networks. Paper presented at: proceedings of the European conference on computer vision (ECCV). 2018. p. 135–151.
72
Wu D Shao L. Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition; 2014 Jun 23–28; Columbus OH.
73
Zhao R, Ali H, Van der Smagt P. Two-stream RNN/CNN for action recognition in 3D videos. Paper presented at: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2017 Sep 24–28; Vancouver BC, Canada.
74
Li W, Wen L, Chang MC, Nam Lim S, Lyu S. Adaptive RNN tree for large-scale humean action recognition. Paper presented at: Proceedings of the IEEE international conference on computer vision; 2017 Oct 22–29; Venice Italy.
75
Wang H, Wang L. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017 Jul 21–26; Honolulu, HI.
76
Liu J, Shahroudy A, Xu D, Wang G, Wang G. Spatio-temporal lstm with trust gates for 3d human action recognition. Paper presented at: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14. Springer. 2016. p. 816–33.
77

Li C, Xie C, Zhang B, Han J, Zhen X, Chen J. Memory attention networks for skeleton-based action recognition. IEEE Trans Neural Netw Lear Syst. 2021;33:4800–4814.

78

Li L, Zheng W, Zhang Z, Huang Y, Wang L. Skeleton-based relational modeling for action recognition. arXiv. 2018. https://doi.org/10.48550/arXiv.1805.02556

79
Bradbury J, Merity S, Xiong C, Socher R. Quasi-Recurrent Neural Networks. In: International Conference on Learning Representations. 2016.
80
Lei T, Zhang Y, Artzi Y. Training rnns as fast as cnns. 2018.
81
Li S, Li W, Cook C, Zhu C, Gao Y. Independently recurrent neural network (indrnn): Building a longer and deeper rnn. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018 Jun 18–23; Salt Lake City, UT.
82
Liu J, Wang G, Hu P, Duan LY, Kot AC. Global context-aware attention lstm networks for 3d action recognition. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017 Jul 21–26; Honolulu, HI.
83
Lee I, Kim D, Kang S, Lee S. Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. Paper presented at: Proceedings of the IEEE international conference on computer vision; 2017 Oct 22–29; Venice Italy.
84
Ding Z, Wang P, Ogunbona PO, Li W. Investigation of different skeleton features for cnn-based 3d action recognition. Paper presented at: 2017 IEEE International conference on multimedia & expo workshops (ICMEW); 2017 Jul 10–14; Hong Kong, China.
85

Xu Y, Cheng J, Wang L, Xia H, Liu F, Tao D. Ensemble one-dimensional convolution neural networks for skeleton-based action recognition. IEEE Signal Process Lett. 2018;25:1044–1048.

86
Wang P, Li W, Li C, Hou Y. Action Recognition Based on Joint Trajectory Maps with Convolutional Neural Networks. In: Acm on Multimedia Conference. 2016.
87
Bo L, Dai Y, Cheng X, Chen H, He M. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn. In: IEEE International Conference on Multimedia & Expo Workshops. 2017.
88
Li Y, Xia R, Liu X, Huang Q. Learning shape-motion representations from geometric algebra spatio-temporal model for skeleton-based action recognition. Paper presented at: 2019 IEEE international conference on multimedia and Expo (ICME); 2019 Jul 8–12; Shanghai, China.
89
Caetano C, Sena J, Br´emond F, Dos Santos JA, and Schwartz WR. Skelemotion: A new representation of skeleton joint sequences based on motion information for 3d action recognition. Paper presented at: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS); 2019 Sep 18–21; Taipae, Taiwan.
90
Caetano C, Br´emond F, Schwartz WR. Skeleton image representation for 3d action recognition based on tree structure and reference joints. Paper presented at: 2019 32nd SIBGRAPI conference on graphics, patterns and images (SIBGRAPI). 2019:16–23.
91
Chao L, Zhong Q, Di X, Pu S. Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. 2018.
92
Soo Kim T, Reiter A. Interpretable 3d human action analysis with temporal convolutional networks. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition workshops; 2017 Jul 21–26; Honolulu, HI.
93
Lea C, Flynn MD, Vidal R, Reiter A, Hager GD. Temporal convolutional networks for action segmentation and detection. Paper presented at: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017 Jul 21–26; Honolulu, HI.
94
Ruiz AH, Porzi L, Bul`o SR, and Moreno-Noguer F. 3D CNNs on Distance Matrices for Human Action Recognition. Paper presented at: MM ’17: Proceedings of the 25th ACM international conference on Multimedia; 2024 Oct–Nov 28–01; Melbourne, VIC, Australia.
95
Shi L, Zhang Y, Cheng J, Lu H. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. Paper presented at: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2019 June 15–20; Long Beach, CA.
96
Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N. Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020 Jun 13–19; Seattle, WA.
97
Cheng K, Zhang Y, Cao C, Shi L, Cheng J, Lu H. Decoupling gcn with dropgraph module for skeleton-based action recognition. Paper presented at: Computer Vision–ECCV 2020: 16th European Conference 2020, Proceedings, Part XXIV 16. 2020 Aug 23–28. Glasgow, UK.
98
Chi Hg, Ha MH, Chi S, Lee SW, Huang Q, Ramani K. Infogcn: Representation learning for human skeleton-based action recognition. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022 Jun 18–24; New Orleans, LA.
99
Duan H, Zhao Y, Chen K, Lin D, Dai B. Revisiting skeleton-based action recognition. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022 Jun 18–24; New Orleans, LA.
100
Zhou H, Liu Q, Wang Y. Learning discriminative representations for skeleton based action recognition. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. p. 10608–10617.
101
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q. Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019. p. 3595–3603.
102
Lei S, Yifan Z, Jian C, Hanqing L. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In: IEEE Conference on Computer Vision & Pattern Recognition. 2019.
103
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W. Disentangling and unifying graph convolutions for skeleton-based action recognition. Paper presented at: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2020 Jun 13–19; Seattle, WA.
104
Wang X, Dai Y, Gao L, Song J. Skeleton-based action recognition via adaptive crossform learning. In: Proceedings of the 30th ACM International Conference on Multimedia. 2022. p. 1670–1678.
105

Hao X, Li J, Guo Y, Jiang T, Yu M. Hypergraph neural network for skeleton-based action recognition. IEEE Trans Image Process. 2021;30:2263–2275.

106
Lee J, Lee M, Lee D, Lee S. Hierarchically decomposed graph convolutional networks for skeleton-based action recognition. Paperr presented at: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2023 Oct 1–6; Paris, France.
107

Yang H, Yan D, Zhang L, Sun Y, Li D, Maybank SJ. Feedback graph convolutional network for skeleton-based action recognition. IEEE Trans Image Process. 2021;31:164–175.

108

Bian C, Feng W, Wan L, Wang S. Structural knowledge distillation for efficient skeleton-based action recognition. IEEE Trans Image Process. 2021;30:2963–2976.

109

Fang Z, Zhang X, Cao T, Zheng Y, Sun M. Spatial-temporal slowfast graph convolutional network r skeleton-based action recognition. IET Comput Vis. 2022;16:205–217.

110
Vaswani A, Shazeer N, Parmar N, Uskorelt J, Jones L, Gomez AN, Kaise L, Polosukhin I. Attention is all you need. Paper presented at: NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017. p. 6000–6010.
111

Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M. Transformers in vision: A survey. ACM Comput Surveys. 2022;54(10):1–41.

112
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jegou H. Training data-efficient image transformers & distillation through attention. Paper presented at: International Conference on Machine Learning (ICML). 2021.
113
Ren B, Liu Y, Song Y, Bi W, Cucchiara Rita, Sebe N, Wang W. Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. p. 20382–20391.
114
Ye L, Rochan M, Liu Z, Wang Y. Cross-modal self-attention network for referring image segmentation. Paper presented at: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019. p. 10502–10511.
115
Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C, Gao W. Pre-trained image processing transformer. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021 June 20–25; Nashville, TN.
116
Li Y, Fan Y, Xiang X, Demandoix D, Ranjan R, Timofte R, Gool Van L. Efficient and explicit modelling of image hierarchies for image restoration. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023 Jun 17–24; Vancouver, Canada.
117
Mei G, Poiesi F, Saltori C, Zhang J, Ricci E, Sebe N. Overlap-guided gaussian mixture models for point cloud registration. Paper presented at: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; 2023 Jan 2–7; Waikoloa, HI.
118

Huang X, Mei G, Zhang J. Cross-source point cloud registration: Challenges, progress and prospects. Neurocomputing. 2023;548:126383.

119

Wang W, Mei G, Ren B, Huang X, Poiesi F, Gool Van L, Sebe N, Lepri B. Zero-shot point cloud registration. arXiv. 2023. https://doi.org/10.48550/arXiv.2312.03032

120

Cho S, Maqbool M, Liu F, Foroosh H. Self-attention network for skeleton-based human action recognition. arXiv. 2019. https://doi.org/10.48550/arXiv.1912.08435

121
Shi L, Zhang Y, Cheng J, Lu H. Decoupled spatial-temporal attention network for skeleton-based action-gesture recognition. In: Proceedings of the Asian Conference on Computer Vision. 2020.
122
Plizzari C, Cannici M, Matteucci M. Spatial temporal transformer network for skeleton-based action recognition. In: Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, Proceedings, Part Ⅲ. Springer. 2021 Jan 10–15; p. 694–701.
123
Ibh M, Grasshof S, Witzner D, Madeleine P. TemPose: A New Skeleton-Based Transformer Model Designed for Fine-Grained Motion Recognition in Badminton. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023 Jun 17–24; Vancouver, BC, Canada.
124
Zhu W, Ma X, Liu Z, Liu L, Wu W, Wang Y. Motionbert: A unified perspective on learning human motion representations. Paper presented at: Proceedings of the IEEE/CVF International Conference on Computer Vision 2023 Oct 1–6; Paris, France.
125
Xiang W, Li C, Zhou Y, Wang B, Zhang L. Generative Action Description Prompts for Skeleton-based Action Recognition. Paper presented at: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2023 Oct 1–6; Paris, France.
126
Yuan L, He Z, Wang Q, Xu L, Ma X. Spatial transformer network with transfer learning for small-scale fine-grained skeleton-based tai chi action recognition. Paper presented at: IECON 2022–48th Annual Conference of the IEEE Industrial Electronics Society. IEEE. 2022. p. 1–6.
127

Zhang J, Jia Y, Xie W, Tu Z. Zoom transformer for skeleton-based group activity recognition. IEEE Trans Circuits Syst Video Technol. 2022;32(12):8646–8659.

128
Gao Z, Wang P, Lv P, Jiang Z, Liu Q, Wang P, Xu M, Li W. Focal and global spatial-temporal transformer for skeleton-based action recognition. Paper presented at: Proceedings of the Asian Conference on Computer Vision. 2022. p. 382–398.
129
Li W, Zhang Z, Liu Z. Action recognition based on a bag of 3D points. Paper presented at: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition – Workshops; 2010 Jun 13–18; San Francisco, CA.
130
Oreifej O, Liu Z. HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences. Paper presented at: IEEE Conference on Computer Vision & Pattern Recognition; 2013 Jun 23–28; Portland, OR.
131
Shahroudy A, Liu J, Ng T-T, Wang G. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016; p. 1010–1019.
132

Liu J, Shahroudy A, Perez M, Wang G, Duan LY, Kot AC. Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell. 2019;42(10):2684–2701.

133
Wang J, Nie X, Xia Y, Wu Y, and Zhu SC. Cross-view action modeling, learning and recognition In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. p. 2649–56.
134

Duan H, Wang J, Chen K, Lin D. DG-STGCN: dynamic spatial-temporal modeling for skeleton-based action recognition. arXiv. 2022. https://doi.org/10.48550/arXiv.2210.05895

135

Liu J, Wang X, Wang C, Gao Y, Liu M. Temporal Decoupling Graph Convolutional Network for Skeleton-based Gesture Recognition. IEEE Trans Multimedia. 2023;26:811–823.

136
Chen Y, Zhang Z, Yuan C, Li B, Deng Y, Hu W. Channel-wise topology refinement graph convolution for skeleton-based action recognition. Paper presented at: Proceedings of the IEEE/CVF international conference on computer vision. 2021 Oct 10–17; Montreal, QC, Canada.
137
Zeng A, Sun X, Yang L, Zhao N, Liu M, Xu Q. Learning skeletal graph neural networks for hard 3d pose estimation. Paper presented at: Proceedings of the IEEE/CVF international conference on computer vision. 2021 Oct 10–17; Montreal, QC, Canada.
138
Ye F, Pu S, Zhong Q, Li C, Xie D, Tang H. Dynamic gcn: Context-enriched topology learning for skeleton-based action recognition. Paper presented at: Proceedings of the 28th ACM international conference on multimedia; 2020 Oct 12–16; WA, Seattle.
139
Shi L, Zhang Y, Cheng J, Lu H. Skeleton-Based Action Recognition with Directed Graph Neural Networks. Paper presented at: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2019 Jun 15–20; Long Beach, CA.
140

Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N. View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell. 2019;41(8):1963–1978.

141
Si C, Chen W, Wang W, Wang L, Tan T. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019 Jun 15–20; Long Beach, CA.
142
Hu G, Cui B, Yu S. Skeleton-based action recognition with synchronous local and non-local spatio-temporal learning and frequency attention. Paper presented at: 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019 Jul 8–12; Shanghai, China.
143
Liang D, Fan G, Lin G, Chen W, Pan X, Zhu H. Three-stream convolutional neural network with multi-task and ensemble learning for 3d action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops; 2019 Jun 16–17; Long Beach, CA.
144

Song YF, Zhang Z, Wang L. Richly activated graph convolutional network for action recognition with incomplete skeletons. arXiv. 2019. https://doi.org/10.48550/arXiv.1905.06774

145

Zhang P, Lan C, Zeng W, Xue J, Zheng N. Semantics-guided neural networks for efficient skeleton-based human action recognition. arXiv. 2020. https://doi.org/10.48550/arXiv.1904.01189

146

Xu H, Gao Y, Hui Z, Li J, Gao X. Language knowledge-assisted representation learning for skeleton-based action recognition. arXiv. 2023. https://doi.org/10.48550/arXiv.2305.12398

147
Chen T, Zhou D, Wang J, Wang S, Guan Y, He X, Ding E. Learning multi-granular spatio-temporal graph network for skeleton-based action recognition. Paper presented at: Proceedings of the 29th ACM international conference on multimedia; 2021 Oct 20–24; Virtual Event, China.
148
Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H. Skeleton-based action recognition with shift graph convolutional network. Paper presented at: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2020 Jun 13–19; Seattle, WA.
149
Liu M, Yuan J. Recognizing human actions as the evolution of pose estimation maps. Paper presented at: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018 Jun 18–23; Salt Lake City, UT.
150

Ke Q, Bennamoun M, An S, Sohel F, Boussaid F. Learning clip representations for skeleton-based 3D action recognition. IEEE Trans Image Process. 2018;27(6):2842–2855.

151

Liu J, Wang G, Duan LY, Abdiyeva K, Kot AC. Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans Image Process. 2017;27(4):1586–1599.

152

Liu J, Shahroudy A, Wang G, Duan LY, Chichung AK. Skeleton-based online action prediction using scale selection network. IEEE Trans Pattern Anal Mach Intell. 2019;42(6):1453–1467.

153
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F. A new representation of skeleton sequences for 3d action recognition. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. p. 3288–3297.
Cyborg and Bionic Systems
Article number: 0100
Cite this article:
Ren B, Liu M, Ding R, et al. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. Cyborg and Bionic Systems, 2024, 5: 0100. https://doi.org/10.34133/cbsystems.0100
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return