| Sign up

PDF (2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Article | Open Access

Portable Perceptron Network-Based Fast Mode Decision for Video-Based Point Cloud Compression

Shicheng Que^¹, Yue Li^¹()

1College of Computer Science, University of South China, Hengyang 421001, China

Show Author Information

An erratum to this article is available online at:

https://doi.org/10.26599/AIR.2024.9150029

Abstract

In Video-based Point Cloud Compression (V-PCC), 2D videos to be encoded are generated by 3D point cloud projection, and compressed by High Efficiency Video Coding (HEVC). In the process of 2D video compression, the best mode of Coding Unit (CU) is searched by brute-force strategy, which greatly increases the complexity of the encoding process. To address this issue, we first propose a simple and effective Portable Perceptron Network (PPN)-based fast mode decision method for V-PCC under Random Access (RA) configuration. Second, we extract seven simple hand-extracted features for input into the PPN network. Third, we design an adaptive loss function, which can calculate the loss by allocating different weights according to different Rate-Distortion (RD) costs, to train our PPN network. Finally, experimental results show that the proposed method can save encoding complexity of 43.13% with almost no encoding efficiency loss under RA configuration, which is superior to the state-of-the-art methods. The source code is available at https://github.com/Mesks/PPNforV-PCC.

Keywords

Video-based Point Cloud Compression (V-PCC)high efficiency video coding fast mode decision portable perceptron network

References

[1]

C. Cao, M. Preda, V. Zakharchenko, E. S. Jang, and T. Zaharia, Compression of sparse and dense dynamic point clouds—Methods and standards, Proc. IEEE, vol. 109, no. 9, pp. 1537–1558, 2021.

Crossref Google Scholar

[2]

D. Graziosi, O. Nakagami, S. Kuma, A. Zaghetto, T. Suzuki, and A. Tabatabai, An overview of ongoing point cloud compression standardization activities: Video-based (V-PCC) and geometry-based (G-PCC), APSIPA Trans. Signal Inf. Process., vol. 9, no. 1, p. e13, 2020.

Crossref Google Scholar

[3]

G. J. Sullivan, J. R. Ohm, W. J. Han, and T. Wiegand, Overview of the high efficiency video coding (HEVC) standard, IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, 2012.

Crossref Google Scholar

[4]

W. Zhu, Y. Yi, H. Zhang, P. Chen, and H. Zhang, Fast mode decision algorithm for HEVC intra coding based on texture partition and direction, J. Real Time Image Process., vol. 17, no. 2, pp. 275–292, 2020.

Crossref Google Scholar

[5]

G. J. Sullivan and T. Wiegand, Rate-distortion optimization for video compression, IEEE Signal Process. Mag., vol. 15, no. 6, pp. 74–90, 1998.

Crossref Google Scholar

[6]

E. d’Eon, B. Harrison, T. Myers, and P. A. Chou, 8i voxelized full bodies - A voxelized point cloud dataset, ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document WG11M40059/WG1M74006, Geneva, 2017.

[7]

S. Ma, X. Zhang, C. Jia, Z. Zhao, S. Wang, and S. Wang, Image and video compression with neural networks: A review, IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 6, pp. 1683–1698, 2020.

Crossref Google Scholar

[8]

Y. Zhang, S. Kwong, and S. Wang, Machine learning based video coding optimizations: A survey, Inf. Sci., vol. 506, pp. 395–423, 2020.

Crossref Google Scholar

[9]

L. Shen, Z. Zhang, and Z. Liu, Adaptive inter-mode decision for HEVC jointly utilizing inter-level and spatiotemporal correlations, IEEE Trans. Circuits Syst. Video Technol., vol. 24, no. 10, pp. 1709–1722, 2014.

Crossref Google Scholar

[10]

J. Vanne, M. Viitanen, and T. D. Hamalainen, Efficient mode decision schemes for HEVC inter prediction, IEEE Trans. Circuits Syst. Video Technol., vol. 24, no. 9, pp. 1579–1593, 2014.

Crossref Google Scholar

[11]

X. Liu, Y. Li, D. Liu, P. Wang, and L. T. Yang, An adaptive CU size decision algorithm for HEVC intra prediction based on complexity classification using machine learning, IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 1, pp. 144–155, 2019.

Crossref Google Scholar

[12]

Z. Pan, P. Zhang, B. Peng, N. Ling, and J. Lei, A CNN-based fast inter coding method for VVC, IEEE Signal Process. Lett., vol. 28, pp. 1260–1264, 2021.

Crossref Google Scholar

[13]

H. Yang, L. Shen, X. Dong, Q. Ding, P. An, and G. Jiang, Low-complexity CTU partition structure decision and fast intra mode decision for versatile video coding, IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 6, pp. 1668–1682, 2020.

Crossref Google Scholar

[14]

Z. Liu, T. Li, Y. Chen, K. Wei, M. Xu, and H. Qi, Deep multi-task learning based fast intra-mode decision for versatile video coding, IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 10, pp. 6101–6116, 2023.

Crossref Google Scholar

[15]

F. Duanmu, Z. Ma, and Y. Wang, Fast mode and partition decision using machine learning for intra-frame coding in HEVC screen content coding extension, IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 6, no. 4, pp. 517–531, 2016.

Crossref Google Scholar

[16]

M. Xu, T. Li, Z. Wang, X. Deng, R. Yang, and Z. Guan, Reducing complexity of HEVC: A deep learning approach, IEEE Trans. Image Process., vol. 27, no. 10, pp. 5044–5059, 2018.

Crossref Google Scholar

[17]

K. Kim and W. W. Ro, Fast CU depth decision for HEVC using neural networks, IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 5, pp. 1462–1473, 2019.

Crossref Google Scholar

[18]

S. H. Park and J. W. Kang, Fast multi-type tree partitioning for versatile video coding using a lightweight neural network, IEEE Trans. Multimedia, vol. 23, pp. 4388–4399, 2021.

Crossref Google Scholar

[19]

Q. Zhang, R. Guo, B. Jiang, and R. Su, Fast CU decision-making algorithm based on DenseNet network for VVC, IEEE Access, vol. 9, pp. 119289–119297, 2021.

Crossref Google Scholar

[20]

A. Feng, K. Liu, D. Liu, L. Li, and F. Wu, Partition map prediction for fast block partitioning in VVC intra-frame coding, IEEE Trans. Image Process., vol. 32, pp. 2237–2251, 2023.

Crossref Google Scholar

[21]

Y. Liu, M. Abdoli, T. Guionnet, C. Guillemot, and A. Roumy, Light-weight CNN-based VVC inter partitioning acceleration, in Proc. 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), Nafplio, Greece, 2022, pp. 1–5.

[22]

C. Zou, S. Wan, T. Ji, M. G. Blanch, M. Mrak, and L. Herranz, Chroma intra prediction with lightweight attention-based neural networks, IEEE Trans. Circuits Syst. Video Technol., p. 1, 2023.

[23]

M. Li and W. Ji, Lightweight multiattention recursive residual CNN-based In-loop filter driven by neuron diversity, IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 11, pp. 6996–7008, 2023.

Crossref Google Scholar

[24]

S. Ryu and J. Kang, Machine learning-based fast angular prediction mode decision technique in video coding, IEEE Trans. Image Process., vol. 27, no. 11, pp. 5525–5538, 2018.

Crossref Google Scholar

[25]

Y. Zhang, S. Kwong, X. Wang, H. Yuan, Z. Pan, and L. Xu, Machine learning-based coding unit depth decisions for flexible complexity allocation in high efficiency video coding, IEEE Trans. Image Process., vol. 24, no. 7, pp. 2225–2238, 2015.

Crossref Google Scholar

[26]

J. Xiong, H. Gao, M. Wang, H. Li, and W. Lin, Occupancy map guided fast video-based dynamic point cloud coding, IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 2, pp. 813–825, 2022.

Crossref Google Scholar

[27]

L. Li, Z. Li, S. Liu, and H. Li, Occupancy-map-based rate distortion optimization and partition for video-based point cloud compression, IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 1, pp. 326–338, 2021.

Crossref Google Scholar

[28]

H. Yuan, W. Gao, G. Li, and Z. Li, Rate-distortion-guided learning approach with cross-projection information for V-PCC fast CU decision, in Proc. 30th ACM Int. Conf. Multimedia, Lisboa, Portugal, 2022, pp. 3085–3093.

[29]

R. H. Gweon and Y.-L. Lee, Early termination of CU encoding to reduce HEVC complexity, IEICE Trans. Fundamentals, vol. E95. A, no. 7, pp. 1215–1218, 2012.

Crossref Google Scholar

[30]

S. Narayan, The generalized sigmoid activation function: Competitive supervised learning, Inf. Sci., vol. 99, no. 1-2, pp. 69–82, 1997.

Crossref Google Scholar

[31]

MPEGGroup, Video Point Cloud Compression - VPCC - mpeg-pcc-tmc2 test model candidate software, https://github.com/MPEGGroup/mpeg-pcc-tmc2, 2021.

[32]

S. Schwarz, P. A. Chou, and M. Budagavi, Common test conditions for point cloud compression, ISO/IEC JTC1/SC29/WG11 output document N17345, Gwangju, Republic of Korea, 2018.

[33]

Y. Xu, Y. Lu, and Z. Wen, Owlii Dynamic human mesh sequence dataset, ISO/IEC JTC1/SC29/WG11 m41658, 120th MPEG Meeting, Macau, China, 2017.

CAAI Artificial Intelligence Research

Article number: 9150022

DOI: 10.26599/AIR.2023.9150022

Cite this article:

Que S, Li Y. Portable Perceptron Network-Based Fast Mode Decision for Video-Based Point Cloud Compression. CAAI Artificial Intelligence Research, 2023, 2: 9150022. https://doi.org/10.26599/AIR.2023.9150022

Part of a topical collection:

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号