This paper proposes a novel method for early action prediction based on 3D skeleton data. Our method combines the advantages of graph convolutional networks (GCNs) and adversarial learning to avoid the problems of insufficient spatio-temporal feature extraction and difficulty in predicting actions in the early execution stage of actions. In our method, GCNs, which have outstanding performance in the field of action recognition, are used to extract the spatio-temporal features of the skeleton. The model learns how to optimize the feature distribution of partial videos from the features of full videos through adversarial learning. Experiments on two challenging action prediction datasets show that our method performs well on skeleton-based early action prediction. State-of-the-art performance is reported in some observation ratios.
Kong Y, Tao Z, Fu Y. Adversarial action prediction networks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2020, 42(3): 539–553. DOI: 10.1109/TPAMI.2018.2882805.
Ke Q, Bennamoun M, Rahmani H, An S, Sohel F, Boussaid F. Learning latent global network for skeleton-based action prediction. IEEE Trans. Image Processing, 2020, 29: 959–970. DOI: 10.1109/tip.2019.2937757.
Plizzari C, Cannici M, Matteucci M. Skeleton-based action recognition via spatial and temporal transformer networks. Computer Vision and Image Understanding, 2021, 208-209: 103219. DOI: 10.1016/j.cviu.2021.103219.
Bian C, Feng W, Wan L, Wang S. Structural knowledge distillation for efficient skeleton-based action recognition. IEEE Trans. Image Processing, 2021, 30: 2963–2976. DOI: 10.1109/tip.2021.3056895.
Liu J, Shahroudy A, Xu D, Kot A C, Wang G. Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Analysis and Machine Intelligence, 2018, 40(12): 3007–3021. DOI: 10.1109/tpami.2017.2771306.
Song Y F, Zhang Z, Shan C, Wang L. Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans. Circuits and Systems for Video Technology, 2021, 31(5): 1915–1925. DOI: 10.1109/tcsvt.2020.3015051.
Shi L, Zhang Y, Cheng J, Lu H. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans. Image Processing, 2020, 29: 9532–9545. DOI: 10.1109/tip.2020.3028207.
Chen L, Lu J, Song Z, Zhou J. Recurrent semantic preserving generation for action prediction. IEEE Trans. Circuits and Systems for Video Technology, 2021, 31(1): 231–245. DOI: 10.1109/tcsvt.2020.2975065.
Hu J F, Zheng W S, Ma L, Wang G, Lai J, Zhang J. Early action prediction by soft regression. IEEE Trans. Pattern Analysis and Machine Intelligence, 2019, 41(11): 2568–2583. DOI: 10.1109/tpami.2018.2863279.