Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
Action segmentation has made significant progress, but segmenting and recognizing actions from untrimmed long videos remains a challenging problem. Most state-of-the-art methods focus on designing models based on temporal convolution. However, the limitations of modeling long-term temporal dependencies and the inflexibility of temporal convolutions restrict the potential of these models. To address the issue of over-segmentation in existing action segmentation methods, which leads to classification errors and reduced segmentation quality, this paper proposes a global spatial-temporal information encoder-decoder based action segmentation method. The method proposed in this paper uses the global temporal information captured by refinement layer to assist the Encoder-Decoder (ED) structure in judging the action segmentation point more accurately and, at the same time, suppress the excessive segmentation phenomenon caused by the ED structure. The method proposed in this paper achieves 93% frame accuracy on the constructed real Tai Chi action dataset. The experimental results prove that this method can accurately and efficiently complete the long video action segmentation task.
A. Ullah, J. Ahmad, K. Muhammad, M. Sajjad, and S. W. Baik, Action recognition in video sequences using deep bi-directional LSTM with CNN features, IEEE Access, vol. 6, pp. 1155–1166, 1994.
J. Donahue, L. A. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama, K. Saenko, and T. Darrell, Long-term recurrent convolutional networks for visual recognition and description, IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 677–691, 2017.
D. Wang, D. Hu, X. Li, and D. Dou, Temporal relational modeling with self-supervision for action segmentation, Proc. AAAI Conf. Artif. Intell., vol. 35, no. 4, pp. 2729–2737, 2021.
X. Shu, J. Yang, R. Yan, and Y. Song, Expansion-squeeze-excitation fusion network for elderly activity recognition, IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 8, pp. 5281–5292, 2022.
W. Ng, M. Zhang, and T. Wang, Multi-localized sensitive autoencoder-attention-LSTM for skeleton-based action recognition, IEEE Trans. Multimed., vol. 24, pp. 1678–1690, 2022.
H. Wu, X. Ma, and Y. Li, Spatiotemporal multimodal learning with 3D CNNs for video action recognition, IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1250–1261, 2022.
S. Cao, J. Li, K. P. Nelson, and M. A. Kon, Coupled VAE: Improved accuracy and robustness of a variational autoencoder, Entropy, vol. 24, no. 3, p. 423, 2022.
G. Ding and A. Yao, Temporal action segmentation with high-level complex activity labels, IEEE Trans. Multimed., vol. 25, pp. 1928–1939, 1928.
D. Singhania, R. Rahaman, and A. Yao, C2F-TCN: A framework for semi- and fully-supervised temporal action segmentation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 11484–11501, 2023.
167
Views
18
Downloads
0
Crossref
0
Web of Science
0
Scopus
0
CSCD
Altmetrics
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).