Active self-training for weakly supervised 3D scene semantic segmentation

Gengxin Liu; Oliver van Kaick; Hui Huang; Ruizhen Hu

doi:10.1007/s41095-022-0311-7

| Sign up

PDF (6.8 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Research Article | Open Access

Active self-training for weakly supervised 3D scene semantic segmentation

Gengxin Liu^¹, Oliver van Kaick^², Hui Huang^¹, Ruizhen Hu^¹()

1College of Computer Science & Software Engineering,Shenzhen University, Shenzhen 518060, China

2School of Computer Science, Carleton University, Ottawa K1S 5B6, Canada

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

Since the preparation of labeled data for training semantic segmentation networks of point clouds is a time-consuming process, weakly supervised approaches have been introduced to learn from only a small fraction of data. These methods are typically based on learning with contrastive losses while automatically deriving per-point pseudo-labels from a sparse set of user-annotated labels. In this paper, our key observation is that the selection of which samples to annotate is as important as how these samples are used for training. Thus, we introduce a method for weakly supervised segmentation of 3D scenes that combines self-training with active learning. Active learning selects points for annotation that are likely to result in improvements to the trained model, while self-training makes efficient use of the user-provided labels for learning the model. We demonstrate that our approach leads to an effective method that provides improvements in scene segmentation over previous work and baselines, while requiring only a few user annotations.

Keywords

semantic segmentation weakly supervised self-training active learning

References

[1]

Qi,

C. R.

; Yi,

; Su,

; Guibas,

L. J.

PointNet++: Deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 5105–5114, 2017.

[2]

Li,

; Bu,

; Sun,

; Wu,

; Di,

; Chen,

PointCNN: Convolution on

X

-transformed points. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 828–838, 2018.

[3]

Thomas,

; Qi,

C. R.

; Deschaud,

J. E.

; Marcotegui,

; Goulette,

; Guibas,

KPConv: Flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 6410–6419, 2019.

Crossref

[4]

Han,

; Zheng,

; Xu,

; Fang,

OccuSeg: Occupancy-aware 3D instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2937–2946, 2020.

Crossref

[5]

Dai,

; Chang,

A. X.

; Savva,

; Halber,

; Funkhouser,

; Niessner,

ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2432–2443, 2017.

Crossref

[6]

Armeni,

; Sax,

; Zamir,

A. R.

; Savarese,

Joint 2D-3D-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105, 2017.

Google Scholar

[7]

Wei,

J. C.

; Lin,

G. S.

; Yap,

K. H.

; Hung,

T. Y.

; Xie,

L. H.

Multi-path region mining for weakly supervised 3D semantic segmentation on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4383–4392, 2020.

Crossref

[8]

Xu,

; Lee,

G. H.

Weakly supervised semantic point cloud segmentation: Towards 10

\times

fewer labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13703–13712, 2020.

Crossref

[9]

Gadelha,

; RoyChowdhury,

; Sharma,

; Kalogerakis,

; Cao,

L. L.

; Learned-Miller,

; Wang,

; Maji,

Label-efficient learning on point clouds using approximate convex decompositions. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12355. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 473–491, 2020.

Crossref

[10]

Xie,

S. N.

; Gu,

J. T.

; Guo,

D. M.

; Qi,

C. R.

; Guibas,

; Litany,

PointContrast: Unsupervised pre-training for 3D point cloud understanding. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12348. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 574–591, 2020.

Crossref

[11]

Jiang,

; Shi,

S. S.

; Tian,

Z. T.

; Lai,

; Liu,

; Fu,

C. W.

; Jia,

J. Y.

Guided point contrastive learning for semi-supervised point cloud semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 6403–6412, 2021.

Crossref

[12]

Hou,

; Graham,

; Niesner,

; Xie,

S. N.

Exploring data-efficient 3D scene understanding with contrastive scene contexts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15582–15592, 2021.

Crossref

[13]

Choy,

; Gwak,

; Savarese,

4D spatio-temporal ConvNets: Minkowski convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3070–3079, 2019.

Crossref

[14]

Liu,

Z. Z.

; Qi,

X. J.

; Fu,

C. W.

One thing one click: A self-training approach for weakly supervised 3D semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1726–1736, 2021.

Crossref

[15]

Hu,

R. Z.

; Wen,

; Van Kaick,

; Chen,

L. M.

; Lin,

; Cohen-Or,

; Huang,

Semantic object reconstruction via casual handheld scanning. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 219, 2018.

Crossref Google Scholar

[16]

Wu,

T. H.

; Liu,

Y. C.

; Huang,

Y. K.

; Lee,

H. Y.

; Su,

H. T.

; Huang,

P. C.

; Hsu,

W. H.

ReDAL: Region-based and diversity-aware active learning for point cloud semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 15490–15499, 2021.

Crossref

[17]

Shi,

; Xu,

; Chen,

; Cai,

; Foo,

C. S.

; Jia,

Label-efficient point cloud semantic segmentation: An active learning approach. arXiv preprint arXiv:2101.06931, 2021.

Google Scholar

[18]

Wu,

Z. R.

; Song,

S. R.

; Khosla,

; Yu,

; Zhang,

L. G.

; Tang,

X. O.

; Xiao,

J. X.

3D ShapeNets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1912–1920, 2015.

[19]

Charles,

R. Q.

; Hao,

; Mo,

K. C.

; Guibas,

L. J.

PointNet: Deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 77–85, 2017.

Crossref

[20]

Wu,

W. X.

; Qi,

Z. A.

; Li,

F. X.

PointConv: Deep convolutional networks on 3D point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern, 9613–9622, 2019.

Crossref

[21]

Komarichev,

; Zhong,

Z. C.

; Hua,

A-CNN: Annularly convolutional neural networks on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7413–7422, 2019.

Crossref

[22]

Su,

; Jampani,

; Sun,

D. Q.

; Maji,

; Kalogerakis,

; Yang,

M. H.

; Kautz,

SPLATNet: Sparse lattice networks for point cloud processing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2530–2539, 2018.

Crossref

[23]

Liu,

; Tang,

; Lin,

; Han,

Point-voxel CNN for efficient 3D deep learning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No. 87, 965–975, 2019.

[24]

Ye,

X. Q.

; Li,

J. M.

; Huang,

H. X.

; Du,

; Zhang,

X. L.

3D recurrent neural networks with context fusion for point cloud semantic segmentation. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11211. Ferrari,

; Hebert,

; Sminchisescu,

; Weiss,

Eds. Springer Cham, 415–430, 2018.

Crossref

[25]

Guo,

M. H.

; Cai,

J. X.

; Liu,

Z. N.

; Mu,

T. J.

; Martin,

R. R.

; Hu,

S. M.

PCT: Point cloud transformer. Computational Visual Media Vol. 7, No. 2, 187–199, 2021.

Crossref Google Scholar

[26]

Peng,

H. T.

; Zhou,

; Yin,

L. Y.

; Guo,

; Zhao,

Q. P.

Semantic part segmentation of single-view point cloud. Science China Information Sciences Vol. 63, No. 12, 224101, 2020.

Crossref Google Scholar

[27]

Kundu,

; Yin,

X. Q.

; Fathi,

; Ross,

; Brewington,

; Funkhouser,

; Pantofaru,

Virtual multi-view fusion for 3D semantic segmentation. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12369. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 518–535, 2020.

Crossref

[28]

Dai,

; Nießner,

3DMV: Joint 3D-multi-view prediction for 3D semantic scene segmentation. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11214. Ferrari,

; Hebert,

; Sminchisescu,

; Weiss,

Eds. Springer Cham, 458–474, 2018.

Crossref

[29]

Graham,

; Engelcke,

; van der Maaten,

3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9224–9232, 2018.

Crossref

[30]

Landrieu,

; Simonovsky,

Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4558–4567, 2018.

Crossref

[31]

Tatarchenko,

; Park,

; Koltun,

; Zhou,

Q. Y.

Tangent convolutions for dense prediction in 3D. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3887–3896, 2018.

Crossref

[32]

Huang,

S. S.

; Ma,

Z. Y.

; Mu,

T. J.

; Fu,

H. B.

; Hu,

S. M.

Supervoxel convolution for online 3D semantic segmentation. ACM Transactions on Graphics Vol. 40, No. 3, Article No. 34, 2021.

Crossref Google Scholar

[33]

Zhang,

J. Z.

; Zhu,

C. Y.

; Zheng,

L. T.

; Xu,

Fusion-aware point convolution for online semantic 3D scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4533–4542, 2020.

Crossref

[34]

Cheng,

M. M.

; Hui,

; Xie,

; Yang,

SSPC-net: Semi-supervised semantic 3D point cloud segmentation network. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 2, 1140–1147, 2021.

Crossref Google Scholar

[35]

Zhang,

Z. W.

; Girdhar,

; Joulin,

; Misra,

Self-supervised pretraining of 3D features on any point-cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10232–10243, 2021.

Crossref

[36]

Yi,

; Kim,

V. G.

; Ceylan,

; Shen,

I. C.

; Yan,

M. Y.

; Su,

; Lu,

C. W.

; Huang,

Q. X.

; Sheffer,

; Guibas,

A scalable active framework for region annotation in 3D shape collections. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 210, 2016.

Crossref Google Scholar

[37]

Rizve,

M. N.

; Duarte,

; Rawat,

Y. S.

; Shah,

In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning. arXiv preprint arXiv:2101.06329, 2021.

Google Scholar

[38]

Paszke,

; Gross,

; Massa,

; Lerer,

; Bradbury,

; Chanan,

; Killeen,

; Lin,

; Gimelshein,

; Antiga

; et al. PyTorch: An imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No. 721, 8026–8037, 2019.

[39]

Lin,

Y. B.

; Wang,

; Zhai,

D. W.

; Li,

Toward better boundary preserved supervoxel segmentation for 3D point clouds. ISPRS Journal of Photogrammetry and Remote Sensing Vol. 143, 39–47, 2018.

Crossref Google Scholar

Computational Visual Media

Volume 10 Issue 3,
June 2024

Pages 425-438

DOI: 10.1007/s41095-022-0311-7

Cite this article:

Liu G, van Kaick O, Huang H, et al. Active self-training for weakly supervised 3D scene semantic segmentation. Computational Visual Media, 2024, 10(3): 425-438. https://doi.org/10.1007/s41095-022-0311-7