Article Link
Collect
Submit Manuscript
Show Outline
Outline
Abstract
Keywords
Electronic Supplementary Material
References
Show full outline
Hide outline
Regular Paper

Adversarial Graph Convolutional Network for Skeleton-Based Early Action Prediction

School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
School of Information Science and Engineering, Xinjiang University of Science and Technology, Korla 841000, China
Key Laboratory for Software Engineering of Hebei Province, Yanshan University, Qinhuangdao 066004, China
Show Author Information

Abstract

This paper proposes a novel method for early action prediction based on 3D skeleton data. Our method combines the advantages of graph convolutional networks (GCNs) and adversarial learning to avoid the problems of insufficient spatio-temporal feature extraction and difficulty in predicting actions in the early execution stage of actions. In our method, GCNs, which have outstanding performance in the field of action recognition, are used to extract the spatio-temporal features of the skeleton. The model learns how to optimize the feature distribution of partial videos from the features of full videos through adversarial learning. Experiments on two challenging action prediction datasets show that our method performs well on skeleton-based early action prediction. State-of-the-art performance is reported in some observation ratios.

Electronic Supplementary Material

Download File(s)
JCST-2207-12638-Highlights.pdf (484.6 KB)

References

[1]
Wang X, Hu J F, Lai J H, Zhang J, Zheng W S. Progressive teacher-student learning for early action prediction. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp.3551–3560. DOI: 10.1109/cvpr.2019.00367.
[2]
Hu J F, Zheng W S, Ma L, Wang G, Lai J. Real-time RGB-D activity prediction by soft regression. In Proc. ECCV 2016, Oct. 2016, pp.280–296. DOI: 10.1007/978-3-319-46448-0_17.
[3]

Kong Y, Tao Z, Fu Y. Adversarial action prediction networks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2020, 42(3): 539–553. DOI: 10.1109/TPAMI.2018.2882805.

[4]

Ke Q, Bennamoun M, Rahmani H, An S, Sohel F, Boussaid F. Learning latent global network for skeleton-based action prediction. IEEE Trans. Image Processing, 2020, 29: 959–970. DOI: 10.1109/tip.2019.2937757.

[5]
Cai Y, Li H, Hu J F, Zheng W S. Action knowledge transfer for action prediction with partial videos. In Proc. the 33rd AAAI Conference on Artificial Intelligence, Jan. 27–Feb. 1, 2019, pp.8118–8125. DOI: 10.1609/aaai.v33i01.33018118.
[6]
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V. Domain-adversarial training of neural networks. In Domain Adaptation in Computer Vision Applications, Csurka G (ed.), Springer, 2017, pp.189–209. DOI: 10.1007/978-3-319-58347-1_10.
[7]
Motiian S, Jones Q, Iranmanesh S M, Doretto G. Few-shot adversarial domain adaptation. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6673–6683.
[8]
Li J, Liang X, Wei Y, Xu T, Feng J, Yan S. Perceptual generative adversarial networks for small object detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp.1951–1959. DOI: 10.1109/cvpr.2017.211.
[9]
Shi L, Zhang Y, Cheng J, Lu H. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp.12018–12027. DOI: 10.1109/cvpr.2019.01230.
[10]
Vemulapalli R, Arrate F, Chellappa R. Human action recognition by representing 3D skeletons as points in a lie group. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp.588–595. DOI: 10.1109/cvpr.2014.82.
[11]
Du Y, Wang W, Wang L. Hierarchical recurrent neural network for skeleton based action recognition. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp.1110–1118. DOI: 10.1109/cvpr.2015.7298714.
[12]
Song S, Lan C, Xing J, Zeng W, Liu J. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proc. the 31st AAAI Conference on Artificial Intelligence, Feb. 2017, pp.4263–4270. DOI: 10.1609/aaai.v31i1.11212.
[13]
Shahroudy A, Liu J, Ng T T, Wang G. NTU RGB+D: A large scale dataset for 3D human activity analysis. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp.1010–1019. DOI: 10.1109/cvpr.2016.115.
[14]
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W. Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp.140–149. DOI: 10.1109/cvpr42600.2020.00022.
[15]
Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proc. the 32nd AAAI Conference on Artificial Intelligence, Feb. 2018, pp.7444–7452. DOI: 10.1609/aaai.v32i1.12328.
[16]

Plizzari C, Cannici M, Matteucci M. Skeleton-based action recognition via spatial and temporal transformer networks. Computer Vision and Image Understanding, 2021, 208-209: 103219. DOI: 10.1016/j.cviu.2021.103219.

[17]
Chen Y, Zhang Z, Yuan C, Li B, Deng Y, Hu W. Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2021, pp.13339–13348. DOI: 10.1109/iccv48922.2021.01311.
[18]
Chen Z, Li S, Yang B, Li Q, Liu H. Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. In Proc. the 35th AAAI Conference on Artificial Intelligence, Feb. 2021, pp.1113–1122. DOI: 10.1609/aaai.v35i2.16197.
[19]
Cai J, Jiang N, Han X, Jia K, Lu J. JOLO-GCN: Mining joint-centered light-weight information for skeleton-based action recognition. In Proc. the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Jan. 2021, pp.2734–2743. DOI: 10.1109/wacv48630.2021.00278.
[20]

Bian C, Feng W, Wan L, Wang S. Structural knowledge distillation for efficient skeleton-based action recognition. IEEE Trans. Image Processing, 2021, 30: 2963–2976. DOI: 10.1109/tip.2021.3056895.

[21]

Liu J, Shahroudy A, Xu D, Kot A C, Wang G. Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Analysis and Machine Intelligence, 2018, 40(12): 3007–3021. DOI: 10.1109/tpami.2017.2771306.

[22]
Li C, Zhong Q, Xie D, Pu S. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In Proc. the 27th International Joint Conference on Artificial Intelligence, Jul. 2018, pp.786–792. DOI: 10.24963/ijcai.2018/109.
[23]
Li C, Zhong Q, Xie D, Pu S. Skeleton-based action recognition with convolutional neural networks. In Proc. the 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Jul. 2017, pp.597–600. DOI: 10.1109/icmew.2017.8026285.
[24]
Shi L, Zhang Y, Cheng J, Lu H. Skeleton-based action recognition with directed graph neural networks. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp.7904–7913. DOI: 10.1109/cvpr.2019.00810.
[25]
Cheng K, Zhang Y, Cao C, Shi L, Cheng J, Lu H. Decoupling GCN with DropGraph module for skeleton-based action recognition. In Computer Vision – ECCV 2020, Vedaldi A, Bischof H, Brox T, Frahm J M (eds.), Springer, 2020, pp.536–553. DOI: 10.1007/978-3-030-58586-0_32.
[26]

Song Y F, Zhang Z, Shan C, Wang L. Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans. Circuits and Systems for Video Technology, 2021, 31(5): 1915–1925. DOI: 10.1109/tcsvt.2020.3015051.

[27]
Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N. Semantics-guided neural networks for efficient skeleton-based human action recognition. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp.1109–1118. DOI: 10.1109/cvpr42600.2020.00119.
[28]
Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H. Skeleton-based action recognition with shift graph convolutional network. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp.180–189. DOI: 10.1109/cvpr42600.2020.00026.
[29]
Thakkar K, Narayanan P J. Part-based graph convolutional network for action recognition. arXiv: 1809.04983, 2018. https://arxiv.org/abs/1809.04983, Nov. 2024.
[30]
Song Y F, Zhang Z, Shan C, Wang L. Stronger, faster and more explainable: A graph convolutional baseline for skeleton-based action recognition. In Proc. the 28th ACM International Conference on Multimedia, Oct. 2020, pp.1625–1633. DOI: 10.1145/3394171.3413802.
[31]

Shi L, Zhang Y, Cheng J, Lu H. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans. Image Processing, 2020, 29: 9532–9545. DOI: 10.1109/tip.2020.3028207.

[32]
Aliakbarian M S, Saleh F S, Salzmann M, Fernando B, Petersson L, Andersson L. Encouraging LSTMs to anticipate actions very early. In Proc. the 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp.280–289. DOI: 10.1109/iccv.2017.39.
[33]
Kong Y, Gao S, Sun B, Fu Y. Action prediction from videos via memorizing hard-to-predict samples. In Proc. the 32nd AAAI Conference on Artificial Intelligence, Feb. 2018, pp.7000–7007. DOI: 10.1609/aaai.v32i1.12324.
[34]

Chen L, Lu J, Song Z, Zhou J. Recurrent semantic preserving generation for action prediction. IEEE Trans. Circuits and Systems for Video Technology, 2021, 31(1): 231–245. DOI: 10.1109/tcsvt.2020.2975065.

[35]
Zhao H, Wildes R P. Spatiotemporal feature residual propagation for action prediction. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 27–Nov. 2, 2019, pp.7002–7011. DOI: 10.1109/iccv.2019.00710.
[36]
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp.2921–2929. DOI: 10.1109/cvpr.2016.319.
[37]
Hu J F, Zheng W S, Lai J, Zhang J. Jointly learning heterogeneous features for RGB-D activity recognition. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp.5344–5352. DOI: 10.1109/cvpr.2015.7299172.
[38]

Hu J F, Zheng W S, Ma L, Wang G, Lai J, Zhang J. Early action prediction by soft regression. IEEE Trans. Pattern Analysis and Machine Intelligence, 2019, 41(11): 2568–2583. DOI: 10.1109/tpami.2018.2863279.

[39]
Ma S, Sigal L, Sclaroff S. Learning activity progression in LSTMs for activity detection and early detection. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp.1942–1950. DOI: 10.1109/cvpr.2016.214.
[40]
Kong Y, Tao Z, Fu Y. Deep sequential context networks for action prediction. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp.3662–3670. DOI: 10.1109/cvpr.2017.390.
[41]
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F. A new representation of skeleton sequences for 3D action recognition. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp.4570–4579. DOI: 10.1109/cvpr.2017.486.
[42]
Jain A, Singh A, Koppula H S, Soh S, Saxena A. Recurrent neural networks for driver activity anticipation via sensory-fusion architecture. In Proc. the 2016 IEEE International Conference on Robotics and Automation (ICRA), May 2016, pp.3118–3125. DOI: 10.1109/icra.2016.7487478.
Journal of Computer Science and Technology
Pages 1269-1280
Cite this article:
Li X-S, Zhang N, Cai B-Q, et al. Adversarial Graph Convolutional Network for Skeleton-Based Early Action Prediction. Journal of Computer Science and Technology, 2024, 39(6): 1269-1280. https://doi.org/10.1007/s11390-023-2638-7
Metrics & Citations  
Article History
Copyright
Return