Dynamic Scene Graph Generation of Point Clouds with Structural Representation Learning

Chao Qi; Jianqin Yin; Zhicheng Zhang; Jin Tang

doi:10.26599/TST.2023.9010002

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (5.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Dynamic Scene Graph Generation of Point Clouds with Structural Representation Learning

Chao Qi^{¹^,²}, Jianqin Yin^¹(

), Zhicheng Zhang^¹, Jin Tang^¹

1School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China

2Standard and Metrology Research Institute, China Academy of Railway Sciences Corporation Limited, Beijing 100081, China

Show Author Information

Abstract

Scene graphs of point clouds help to understand object-level relationships in the 3D space. Most graph generation methods work on 2D structured data, which cannot be used for the 3D unstructured point cloud data. Existing point-cloud-based methods generate the scene graph with an additional graph structure that needs labor-intensive manual annotation. To address these problems, we explore a method to convert the point clouds into structured data and generate graphs without given structures. Specifically, we cluster points with similar augmented features into groups and establish their relationships, resulting in an initial structural representation of the point cloud. Besides, we propose a Dynamic Graph Generation Network (DGGN) to judge the semantic labels of targets of different granularity. It dynamically splits and merges point groups, resulting in a scene graph with high precision. Experiments show that our methods outperform other baseline methods. They output reliable graphs describing the object-level relationships without additional manual labeled data.

Keywords

scene graph generation structural representation point cloud

References

[1]

A. A.

Liu

, H.

Zhou

, W.

Nie

, Z.

Liu

, W.

Liu

, H.

Xie

, Z.

Mao

, X.

, and D.

Song

, Hierarchical multi-view context modelling for 3D object classification and retrieval, Inf. Sci., vol. 547, pp. 984–995, 2021.

Crossref Google Scholar

[2]

Deng

, M.

Yang

, Z.

Liang

, Y.

, and C.

Wang

, Fusing geometrical and visual information via superpoints for the semantic segmentation of 3D road scenes, Tsinghua Science and Technology, vol. 25, no. 4, pp. 498–507, 2020.

Crossref Google Scholar

[3]

Yang

, J.

, S.

Lee

, D.

Batra

, and D.

Parikh

, Graph R-CNN for scene graph generation, in Proc. 15^th European Conf. Computer Vision, Munich, Germany, 2018, pp. 690–706.

Crossref Google Scholar

[4]

and S.

Jiang

, Know more say less: Image captioning based on scene graphs, IEEE Trans. Multimedia, vol. 21, no. 8, pp. 2117–2130, 2019.

Crossref Google Scholar

[5]

, Y.

Zhu

, C. B.

Choy

, and L.

Fei-Fei

, Scene graph generation by iterative message passing, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 3097–3106.

Crossref Google Scholar

[6]

Wald

, H.

Dhamo

, N.

Navab

, and F.

Tombari

, Learning 3D semantic scene graphs from 3D indoor reconstructions, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 3960–3969.

Crossref Google Scholar

[7]

Zhang

, J.

, Y.

Song

, and W.

Cai

, Exploiting edge-oriented reasoning for 3D point-based scene graph analysis, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 9700–9710.

Crossref Google Scholar

[8]

, H.

Zhao

, Z.

Lin

, S.

, J.

Cai

, and M.

Ling

, Scene graph generation with external knowledge and image reconstruction, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 1969–1978.

Crossref Google Scholar

[9]

Lin

, F.

Zhu

, Q.

Wang

, Y.

Kong

, J.

Wang

, L.

Huang

, and Y.

Hao

, RSSGG_CS: Remote sensing image scene graph generation by fusing contextual information and statistical knowledge, Remote Sens., vol. 14, no. 13, p. 3118, 2022.

Crossref Google Scholar

[10]

You

, R.

Ying

, X.

Ren

, W. L.

Hamilton

, and J.

Leskovec

, GraphRNN: Generating realistic graphs with deep auto-regressive models, in Proc. 35^th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 5694–5703.

Google Scholar

[11]

Zellers

, M.

Yatskar

, S.

Thomson

, and Y.

Choi

, Neural motifs: Scene graph parsing with global context, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 5831–5840.

Crossref Google Scholar

[12]

Wald

, A.

Avetisyan

, N.

Navab

, F.

Tombari

, and M.

Nießner

, RIO: 3D object instance Re-localization in changing indoor environments, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 7657–7666.

Crossref Google Scholar

[13]

Xie

, J.

Tian

, and X. X.

Zhu

, Linking points with labels in 3D: A review of point cloud semantic segmentation, IEEE Geosci. Remote Sens. Mag., vol. 8, no. 4, pp. 38–59, 2020.

Crossref Google Scholar

[14]

Liu

, Z.

Liu

, Q.

, Z.

Han

, and A.

Núñez

, High-precision detection method for structure parameters of catenary cantilever devices using 3-D point cloud data, IEEE Trans. Instrum. Meas., vol. 70, p. 3507811, 2021.

Crossref Google Scholar

[15]

Xiao

, J.

Zhang

, B.

Adler

, H.

Zhang

, and J.

Zhang

, Three-dimensional point cloud plane segmentation in both structured and unstructured environments, Rob. Auton. Syst., vol. 61, no. 12, pp. 1641–1652, 2013.

Crossref Google Scholar

[16]

J. E.

Deschaud

and F.

Goulette

, A fast and accurate plane detection algorithm for large noisy point clouds using filtered normals and voxel growing, presented at 3D Data Processing Visualization and Transmission, Paris, France, 2010.

Google Scholar

[17]

Rabbani

, F.

Van Den Heuvel

, and G.

Vosselman

, Segmentation of point clouds using smoothness constraints, in Proc. ISPRS Commission V Symp.: Image Engineering and Vision Metrology, Dresden, Germany, 2006, pp. 248–253.

Google Scholar

[18]

M. A.

Wani

and H. R.

Arabnia

, Parallel edge-region-based segmentation algorithm targeted at reconfigurable MultiRing network, J. Supercomput., vol. 25, no. 1, pp. 43–62, 2003.

Crossref Google Scholar

[19]

Papon

, A.

Abramov

, M.

Schoeler

, and F.

Wörgötter

, Voxel cloud connectivity segmentation-supervoxels for point clouds, in Proc. 2013 IEEE Conf. Computer Vision and Pattern Recognition, Portland, OR, USA, 2013, pp. 2027–2034.

Crossref Google Scholar

[20]

Delong

, A.

Osokin

, H. N.

Isack

, and Y.

Boykov

, Fast approximate energy minimization with label costs, Int. J. Comput. Vis., vol. 96, no. 1, pp. 1–27, 2012.

Crossref Google Scholar

[21]

Kong

, L.

, X.

, and S.

, K-plane-based classification of airborne LiDAR data for accurate building roof measurement, IEEE Trans. Instrum. Meas., vol. 63, no. 5, pp. 1200–1214, 2014.

Crossref Google Scholar

[22]

Landrieu

and M.

Simonovsky

, Large-scale point cloud semantic segmentation with superpoint graphs, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 4558–4567.

Crossref Google Scholar

[23]

and J.

Yin

, Multigranularity semantic labeling of point clouds for the measurement of the rail tanker component with structure modeling, IEEE Trans. Instrum. Meas., vol. 70, p. 5000312, 2021.

Crossref Google Scholar

[24]

Guinard

and L.

Landrieu

, Weakly supervised segmentation-aided classification of urban scenes from 3D LiDAR point clouds, in Proc. Int. Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Hannover, Germany, 2017, pp. 151–157.

Crossref Google Scholar

[25]

Landrieu

and G.

Obozinski

, Cut pursuit: Fast algorithms to learn piecewise constant functions on general weighted graphs, SIAM J. Imaging Sci., vol. 10, no. 4, pp. 1724–1766, 2017.

Crossref Google Scholar

[26]

L. P.

Chew

, Constrained Delaunay triangulations, Algorithmica, vol. 4, nos. 1–4, pp. 97–108, 1989.

Crossref Google Scholar

[27]

R. Q.

Charles

, S.

Hao

, K.

, and L. J.

Guibas

, PointNet: Deep learning on point sets for 3D classification and segmentation, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 77–85.

Crossref Google Scholar

[28]

Cho

, B.

van Merrienboer

, A.

Gulcehre

, D.

Bahdanau

, F.

Bougares

, H.

Schwenk

, and Y.

Bengio

, Learning phrase representations using RNN encoder-decoder for statistical machine translation, in Proc. 2014 Conf. Empirical Methods in Natural Language Processing, Doha, Qatar, 2014, pp. 1724–1734.

Crossref Google Scholar

[29]

Armeni

, O.

Sener

, A. R.

Zamir

, H.

Jiang

, I.

Brilakis

, M.

Fischer

, and S.

Savarese

, 3D semantic parsing of large-scale indoor spaces, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 1534–1543.

Crossref Google Scholar

[30]

Roynard

, J. E.

Deschaud

, and F.

Goulette

, Paris-lille-3D: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification, Int. J. Rob. Res., vol. 37, no. 6, pp. 545–557, 2018.

Crossref Google Scholar

[31]

, R.

Krishna

, M. S.

Bernstein

, and L.

Fei-Fei

, Visual relationship detection with language priors, in Proc. 14^th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 852–869.

Crossref Google Scholar

[32]

Gretton

, K. M.

Borgwardt

, M. J.

Rasch

, B.

Schölkopf

, and A.

Smola

, A kernel two-sample test, J. Mach. Learn. Res., vol. 13, pp. 723–773, 2012.

Google Scholar

[33]

Thomas

, C. R.

, J. E.

Deschaud

, B.

Marcotegui

, F.

Goulette

, and L.J.

Guibas

, KPConv: Flexible and deformable convolution for point clouds, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 6410–6419.

Crossref Google Scholar

[34]

T. N.

Kipf

and M.

Welling

, Semi-supervised classification with graph convolutional networks, presented at the 5^th Int. Conf. Learning Representations, Toulon, France, 2017.

Google Scholar

Tsinghua Science and Technology

Volume 29 Issue 1,
February 2024

Pages 232-243

DOI: 10.26599/TST.2023.9010002

Cite this article:

Qi C, Yin J, Zhang Z, et al. Dynamic Scene Graph Generation of Point Clouds with Structural Representation Learning. Tsinghua Science and Technology, 2024, 29(1): 232-243. https://doi.org/10.26599/TST.2023.9010002

465

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 25 August 2022

Revised: 09 November 2022

Accepted: 06 January 2023

Published: 21 August 2023

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).