| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Regular Paper

Adversarial Graph Convolutional Network for Skeleton-Based Early Action Prediction

Xian-Shan Li^{¹^,³}, Neng Zhang^¹, Bin-Quan Cai^¹, Jing-Wen Kang^¹, Feng-Da Zhao^{¹^,²^,³}()

1School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China

2School of Information Science and Engineering, Xinjiang University of Science and Technology, Korla 841000, China

3Key Laboratory for Software Engineering of Hebei Province, Yanshan University, Qinhuangdao 066004, China

Show Author Information

Abstract

This paper proposes a novel method for early action prediction based on 3D skeleton data. Our method combines the advantages of graph convolutional networks (GCNs) and adversarial learning to avoid the problems of insufficient spatio-temporal feature extraction and difficulty in predicting actions in the early execution stage of actions. In our method, GCNs, which have outstanding performance in the field of action recognition, are used to extract the spatio-temporal features of the skeleton. The model learns how to optimize the feature distribution of partial videos from the features of full videos through adversarial learning. Experiments on two challenging action prediction datasets show that our method performs well on skeleton-based early action prediction. State-of-the-art performance is reported in some observation ratios.

Keywords

graph convolutional network adversarial learning skeleton-based action prediction

Electronic Supplementary Material

Download File(s)

JCST-2207-12638-Highlights.pdf (484.6 KB)

References

[1]

Wang X, Hu J F, Lai J H, Zhang J, Zheng W S. Progressive teacher-student learning for early action prediction. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp.3551–3560. DOI: 10.1109/cvpr.2019.00367.

[2]

Hu J F, Zheng W S, Ma L, Wang G, Lai J. Real-time RGB-D activity prediction by soft regression. In Proc. ECCV 2016, Oct. 2016, pp.280–296. DOI: 10.1007/978-3-319-46448-0_17.

[3]

Kong Y, Tao Z, Fu Y. Adversarial action prediction networks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2020, 42(3): 539–553. DOI: 10.1109/TPAMI.2018.2882805.

Crossref Google Scholar

[4]

Ke Q, Bennamoun M, Rahmani H, An S, Sohel F, Boussaid F. Learning latent global network for skeleton-based action prediction. IEEE Trans. Image Processing, 2020, 29: 959–970. DOI: 10.1109/tip.2019.2937757.

Crossref Google Scholar

[5]

Cai Y, Li H, Hu J F, Zheng W S. Action knowledge transfer for action prediction with partial videos. In Proc. the 33rd AAAI Conference on Artificial Intelligence, Jan. 27–Feb. 1, 2019, pp.8118–8125. DOI: 10.1609/aaai.v33i01.33018118.

[6]

Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V. Domain-adversarial training of neural networks. In Domain Adaptation in Computer Vision Applications, Csurka G (ed.), Springer, 2017, pp.189–209. DOI: 10.1007/978-3-319-58347-1_10.

[7]

Motiian S, Jones Q, Iranmanesh S M, Doretto G. Few-shot adversarial domain adaptation. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6673–6683.

[8]

Li J, Liang X, Wei Y, Xu T, Feng J, Yan S. Perceptual generative adversarial networks for small object detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp.1951–1959. DOI: 10.1109/cvpr.2017.211.

[9]

Shi L, Zhang Y, Cheng J, Lu H. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp.12018–12027. DOI: 10.1109/cvpr.2019.01230.

[10]

Vemulapalli R, Arrate F, Chellappa R. Human action recognition by representing 3D skeletons as points in a lie group. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp.588–595. DOI: 10.1109/cvpr.2014.82.

[11]

Du Y, Wang W, Wang L. Hierarchical recurrent neural network for skeleton based action recognition. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp.1110–1118. DOI: 10.1109/cvpr.2015.7298714.

[12]

Song S, Lan C, Xing J, Zeng W, Liu J. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proc. the 31st AAAI Conference on Artificial Intelligence, Feb. 2017, pp.4263–4270. DOI: 10.1609/aaai.v31i1.11212.

[13]

Shahroudy A, Liu J, Ng T T, Wang G. NTU RGB+D: A large scale dataset for 3D human activity analysis. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp.1010–1019. DOI: 10.1109/cvpr.2016.115.

[14]

Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W. Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp.140–149. DOI: 10.1109/cvpr42600.2020.00022.

[15]

Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proc. the 32nd AAAI Conference on Artificial Intelligence, Feb. 2018, pp.7444–7452. DOI: 10.1609/aaai.v32i1.12328.

[16]

Plizzari C, Cannici M, Matteucci M. Skeleton-based action recognition via spatial and temporal transformer networks. Computer Vision and Image Understanding, 2021, 208-209: 103219. DOI: 10.1016/j.cviu.2021.103219.

Crossref Google Scholar

[17]

Chen Y, Zhang Z, Yuan C, Li B, Deng Y, Hu W. Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2021, pp.13339–13348. DOI: 10.1109/iccv48922.2021.01311.

[18]

Chen Z, Li S, Yang B, Li Q, Liu H. Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. In Proc. the 35th AAAI Conference on Artificial Intelligence, Feb. 2021, pp.1113–1122. DOI: 10.1609/aaai.v35i2.16197.

[19]

Cai J, Jiang N, Han X, Jia K, Lu J. JOLO-GCN: Mining joint-centered light-weight information for skeleton-based action recognition. In Proc. the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Jan. 2021, pp.2734–2743. DOI: 10.1109/wacv48630.2021.00278.

[20]

Bian C, Feng W, Wan L, Wang S. Structural knowledge distillation for efficient skeleton-based action recognition. IEEE Trans. Image Processing, 2021, 30: 2963–2976. DOI: 10.1109/tip.2021.3056895.

Crossref Google Scholar

[21]

Liu J, Shahroudy A, Xu D, Kot A C, Wang G. Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Analysis and Machine Intelligence, 2018, 40(12): 3007–3021. DOI: 10.1109/tpami.2017.2771306.

Crossref Google Scholar

[22]

Li C, Zhong Q, Xie D, Pu S. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In Proc. the 27th International Joint Conference on Artificial Intelligence, Jul. 2018, pp.786–792. DOI: 10.24963/ijcai.2018/109.

[23]

Li C, Zhong Q, Xie D, Pu S. Skeleton-based action recognition with convolutional neural networks. In Proc. the 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Jul. 2017, pp.597–600. DOI: 10.1109/icmew.2017.8026285.

[24]

Shi L, Zhang Y, Cheng J, Lu H. Skeleton-based action recognition with directed graph neural networks. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp.7904–7913. DOI: 10.1109/cvpr.2019.00810.

[25]

Cheng K, Zhang Y, Cao C, Shi L, Cheng J, Lu H. Decoupling GCN with DropGraph module for skeleton-based action recognition. In Computer Vision – ECCV 2020, Vedaldi A, Bischof H, Brox T, Frahm J M (eds.), Springer, 2020, pp.536–553. DOI: 10.1007/978-3-030-58586-0_32.

[26]

Song Y F, Zhang Z, Shan C, Wang L. Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans. Circuits and Systems for Video Technology, 2021, 31(5): 1915–1925. DOI: 10.1109/tcsvt.2020.3015051.

Crossref Google Scholar

[27]

Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N. Semantics-guided neural networks for efficient skeleton-based human action recognition. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp.1109–1118. DOI: 10.1109/cvpr42600.2020.00119.

[28]

Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H. Skeleton-based action recognition with shift graph convolutional network. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp.180–189. DOI: 10.1109/cvpr42600.2020.00026.

[29]

Thakkar K, Narayanan P J. Part-based graph convolutional network for action recognition. arXiv: 1809.04983, 2018. https://arxiv.org/abs/1809.04983, Nov. 2024.

[30]

Song Y F, Zhang Z, Shan C, Wang L. Stronger, faster and more explainable: A graph convolutional baseline for skeleton-based action recognition. In Proc. the 28th ACM International Conference on Multimedia, Oct. 2020, pp.1625–1633. DOI: 10.1145/3394171.3413802.

[31]

Shi L, Zhang Y, Cheng J, Lu H. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans. Image Processing, 2020, 29: 9532–9545. DOI: 10.1109/tip.2020.3028207.

Crossref Google Scholar

[32]

Aliakbarian M S, Saleh F S, Salzmann M, Fernando B, Petersson L, Andersson L. Encouraging LSTMs to anticipate actions very early. In Proc. the 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp.280–289. DOI: 10.1109/iccv.2017.39.

[33]

Kong Y, Gao S, Sun B, Fu Y. Action prediction from videos via memorizing hard-to-predict samples. In Proc. the 32nd AAAI Conference on Artificial Intelligence, Feb. 2018, pp.7000–7007. DOI: 10.1609/aaai.v32i1.12324.

[34]

Chen L, Lu J, Song Z, Zhou J. Recurrent semantic preserving generation for action prediction. IEEE Trans. Circuits and Systems for Video Technology, 2021, 31(1): 231–245. DOI: 10.1109/tcsvt.2020.2975065.

Crossref Google Scholar

[35]

Zhao H, Wildes R P. Spatiotemporal feature residual propagation for action prediction. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 27–Nov. 2, 2019, pp.7002–7011. DOI: 10.1109/iccv.2019.00710.

[36]

Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp.2921–2929. DOI: 10.1109/cvpr.2016.319.

[37]

Hu J F, Zheng W S, Lai J, Zhang J. Jointly learning heterogeneous features for RGB-D activity recognition. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp.5344–5352. DOI: 10.1109/cvpr.2015.7299172.

[38]

Hu J F, Zheng W S, Ma L, Wang G, Lai J, Zhang J. Early action prediction by soft regression. IEEE Trans. Pattern Analysis and Machine Intelligence, 2019, 41(11): 2568–2583. DOI: 10.1109/tpami.2018.2863279.

Crossref Google Scholar

[39]

Ma S, Sigal L, Sclaroff S. Learning activity progression in LSTMs for activity detection and early detection. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp.1942–1950. DOI: 10.1109/cvpr.2016.214.

[40]

Kong Y, Tao Z, Fu Y. Deep sequential context networks for action prediction. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp.3662–3670. DOI: 10.1109/cvpr.2017.390.

[41]

Ke Q, Bennamoun M, An S, Sohel F, Boussaid F. A new representation of skeleton sequences for 3D action recognition. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp.4570–4579. DOI: 10.1109/cvpr.2017.486.

[42]

Jain A, Singh A, Koppula H S, Soh S, Saxena A. Recurrent neural networks for driver activity anticipation via sensory-fusion architecture. In Proc. the 2016 IEEE International Conference on Robotics and Automation (ICRA), May 2016, pp.3118–3125. DOI: 10.1109/icra.2016.7487478.

Journal of Computer Science and Technology

Volume 39 Issue 6,
November 2024

Pages 1269-1280

DOI: 10.1007/s11390-023-2638-7

Cite this article:

Li X-S, Zhang N, Cai B-Q, et al. Adversarial Graph Convolutional Network for Skeleton-Based Early Action Prediction. Journal of Computer Science and Technology, 2024, 39(6): 1269-1280. https://doi.org/10.1007/s11390-023-2638-7

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号