Semi-supervised 3D shape segmentation with multilevel consistency and part substitution

Chun-Yu Sun; Yu-Qi Yang; Hao-Xiang Guo; Peng-Shuai Wang; Xin Tong; Yang Liu; Heung-Yeung Shum

doi:10.1007/s41095-022-0281-9

| Sign up

PDF (6.7 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Research Article | Open Access

Semi-supervised 3D shape segmentation with multilevel consistency and part substitution

Chun-Yu Sun^¹, Yu-Qi Yang^¹, Hao-Xiang Guo^¹, Peng-Shuai Wang^², Xin Tong^², Yang Liu^²(), Heung-Yeung Shum^¹

1 Institute for Advanced Study, Tsinghua University, Beijing 100084, China

2 Microsoft Research Asia, Beijing 100080, China

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

The lack of fine-grained 3D shape seg-mentation data is the main obstacle to developing learning-based 3D segmentation techniques. We pro-pose an effective semi-supervised method for learning 3D segmentations from a few labeled 3D shapes and a large amount of unlabeled 3D data. For the unlabeled data, we present a novel multilevel consistency loss to enforce consistency of network predictions between perturbed copies of a 3D shape at multiple levels: point level, part level, and hierarchical level. For the labeled data, we develop a simple yet effective part substitution scheme to augment the labeled 3D shapes with more structural variations to enhance training. Our method has been extensively validated on the task of 3D object semantic segmentation on PartNet and ShapeNetPart, and indoor scene semantic segmentation on ScanNet. It exhibits superior performance to existing semi-supervised and unsupervised pre-training 3D approaches.

Keywords

shape segmentation semi-supervised lear-ning multilevel consistency

References

[1]

Ouali,

; Hudelot,

; Tami,

Semi-supervised semantic segmentation with cross-consistency training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12671–12681, 2020.

Crossref

[2]

Ke,

Z. H.

; Qiu,

; Li,

K. C.

; Yan,

; Lau,

R. W. H.

Guided collaborative training for pixel-wise semi-supervised learning. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12358. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 429–445, 2020.

[3]

Shamir,

A survey on mesh segmentation techniques. Computer Graphics Forum Vol. 27, No. 6, 1539–1556, 2008.

Crossref Google Scholar

[4]

Rodrigues,

R. S. V.

; Morgado,

J. F. M.

; Gomes,

A. J. P.

Part-based mesh segmentation: A survey. Computer Graphics Forum Vol. 37, No. 6, 235–274, 2018.

Crossref Google Scholar

[5]

Xu,

; Kim,

V. G.

; Huang,

Q. X.

; Kalogerakis,

Data-driven shape analysis and processing. Computer Graphics Forum Vol. 36, No. 1, 101–132, 2017.

Crossref Google Scholar

[6]

Tulsiani,

; Su,

; Guibas,

L. J.

; Efros,

A. A.

; Malik,

Learning shape abstractions by assembling volumetric primitives. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1466–1474, 2017.

Crossref

[7]

Sun,

C. Y.

; Zou,

Q. F.

; Tong,

; Liu,

Learning adaptive hierarchical cuboid abstractions of 3D shape collections. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 241, 2019.

Crossref Google Scholar

[8]

Paschalidou,

; Ulusoy,

A. O.

; Geiger,

Superquadrics revisited: Learning 3D shape parsing beyond cuboids. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10336–10345, 2019.

Crossref

[9]

Deng,

B. Y.

; Genova,

; Yazdani,

; Bouaziz,

; Hinton,

G. E.

; Tagliasacchi,

CvxNet: Learnable convex decomposition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 31–41, 2020.

Crossref

[10]

Genova,

; Cole,

; Sud,

; Sarna,

; Funkhouser,

Deep structured implicit functions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.

[11]

Chen,

Z. Q.

; Yin,

K. X.

; Fisher,

; Chaudhuri,

; Zhang,

BAE-NET: Branched autoencoder for shape co-segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 8489–8498, 2019.

Crossref

[12]

Guo,

Y. L.

; Wang,

H. Y.

; Hu,

Q. Y.

; Liu,

; Bennamoun,

Deep learning for 3D point clouds: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43, No. 12, 4338–4364, 2021.

Crossref Google Scholar

[13]

Xie,

Z. G.

; Xu,

; Shan,

; Liu,

L. G.

; Xiong,

Y. S.

; Huang,

Projective feature learning for 3D shapes with multi-view depth images. Computer Graphics Forum Vol. 34, No. 7, 1–11, 2015.

Crossref Google Scholar

[14]

Kalogerakis,

; Averkiou,

; Maji,

; Chaudhuri,

3D shape segmentation with projective convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6630–6639, 2017.

Crossref

[15]

Dai,

; Nießner,

3DMV: Joint 3D-multi-view prediction for 3D semantic scene segmentation. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11214. Ferrari,

; Hebert,

; Sminchisescu,

; Weiss,

Eds. Springer Cham, 458–474, 2018.

[16]

Charles,

R. Q.

; Hao,

; Mo,

K. C.

; Guibas,

L. J.

PointNet: Deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 77–85, 2017.

Crossref

[17]

Qi,

; Yi,

; Su,

; Guibas,

PointNet++: Deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the Advances in Neural Information Processing Systems 30, 2017.

[18]

Li,

; Bu,

; Sun,

; Wu,

; Di,

; Chen,

PointCNN: Convolution on X-transformed points. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 828–838, 2018.

[19]

Thomas,

; Qi,

C. R.

; Deschaud,

J. E.

; Marcotegui,

; Goulette,

; Guibas,

KPConv: Flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 6410–6419, 2019.

Crossref

[20]

Wang,

; Sun,

Y. B.

; Liu,

Z. W.

; Sarma,

S. E.

; Bronstein,

M. M.

; Solomon,

J. M.

Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics Vol. 38, No. 5, Article No. 146, 2019.

Crossref Google Scholar

[21]

Hanocka,

; Hertz,

; Fish,

; Giryes,

; Fleishman,

; Cohen-Or,

MeshCNN: A network with an edge. ACM Transactions on Graphics Vol. 38, No. 4, Article No. 90, 2019.

Crossref Google Scholar

[22]

Kalogerakis,

; Hertzmann,

; Singh,

Learning 3D mesh segmentation and labeling. ACM Transactions on Graphics Vol. 29, No. 4, Article No. 102, 2010.

Crossref Google Scholar

[23]

Masci,

; Boscaini,

; Bronstein,

M. M.

; Vandergheynst,

Geodesic convolutional neural networks on Riemannian manifolds. In: Proceedings of the IEEE International Conference on Computer Vision Workshop, 832–840, 2015.

Crossref

[24]

Poulenard,

; Ovsjanikov,

Multi-directional geodesic neural networks via equivariant convolution. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 236, 2018.

Crossref Google Scholar

[25]

Yang,

Y. Q.

; Pan,

; Liu,

S. L.

; Liu,

; Tong,

PFCNN: Convolutional neural networks on 3D surfaces using parallel frames. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13575–13584, 2020.

Crossref

[26]

Song,

S. R.

; Yu,

; Zeng,

; Chang,

A. X.

; Savva,

; Funkhouser,

Semantic scene completion from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 190–198, 2017.

Crossref

[27]

Wang,

P.-S.

; Liu,

; Guo,

Y.-X.

; Sun,

C.-Y.

; Tong,

O-CNN: Octree-based convolutional neural networks for 3D shape analysis. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 72, 2017.

Crossref Google Scholar

[28]

Graham,

; Engelcke,

; van der Maaten,

3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9224–9232, 2018.

Crossref

[29]

Choy,

; Gwak,

; Savarese,

4D spatio-temporal ConvNets: Minkowski convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3070–3079, 2019.

Crossref

[30]

Zhang,

J. Z.

; Zhu,

C. Y.

; Zheng,

L. T.

; Xu,

Fusion-aware point convolution for online semantic 3D scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4533–4542, 2020.

Crossref

[31]

Huang,

S. S.

; Ma,

Z. Y.

; Mu,

T. J.

; Fu,

H. B.

; Hu,

S. M.

Supervoxel convolution for online 3D semantic segmentation. ACM Transactions on Graphics Vol. 40, No. 3, Article No. 34, 2021.

Crossref Google Scholar

[32]

Yi,

; Guibas,

; Hertzmann,

; Kim,

V. G.

; Su,

; Yumer,

Learning hierarchical shape segmentation and labeling from online repositories. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 70, 2017.

Crossref Google Scholar

[33]

Muralikrishnan,

; Kim,

V. G.

; Chaudhuri,

Tags2Parts: Discovering semantic regions from shape tags. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2926–2935, 2018.

Crossref

[34]

Wang,

X. G.

; Zhou,

; Fang,

H. Y.

; Chen,

X. W.

; Zhao,

Q. P.

; Xu,

Learning to group and label fine-grained shape components. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 210, 2018.

Crossref Google Scholar

[35]

Sharma,

; Kalogerakis,

; Maji,

Learning point embeddings from shape repositories for few-shot segmentation. In: Proceedings of the International Conference on 3D Vision, 67–75, 2019.

Crossref

[36]

Zhu,

C. Y.

; Xu,

; Chaudhuri,

; Yi,

; Guibas,

L. J.

; Zhang,

AdaCoSeg: Adaptive shape co-segmentation with group consistency loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8540–8549, 2020.

Crossref

[37]

Xu,

; Lee,

G. H.

Weakly supervised semantic point cloud segmentation: Towards 10

\times

fewer labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13703–13712, 2020.

Crossref

[38]

Bengio,

; Courville,

; Vincent,

Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 35, No. 8, 1798–1828, 2013.

Crossref Google Scholar

[39]

Hassani,

; Haley,

Unsupervised multi-task feature learning on point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 8159–8170, 2019.

Crossref

[40]

Chang,

A. X.

; Funkhouser,

; Guibas,

; Hanrahan,

; Huang,

Q. X.

; Li,

Z. M.

; Savarese,

; Savva,

; Song,

; Su,

; et al. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012, 2015.

Google Scholar

[41]

Wang,

P. S.

; Yang,

Y. Q.

; Zou,

Q. F.

; Wu,

Z. R.

; Liu,

; Tong,

Unsupervised 3D learning for shape analysis via multiresolution instance discrimination. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 4, 2773–2781, 2021.

Crossref Google Scholar

[42]

Hou,

; Graham,

; Nießner,

; Xie,

S. N.

Exploring data-efficient 3D scene understanding with contrastive scene contexts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15582–15592, 2021.

Crossref

[43]

Xie,

S. N.

; Gu,

J. T.

; Guo,

D. M.

; Qi,

C. R.

; Guibas,

; Litany,

PointContrast: Unsupervised pre-training for 3D point cloud understanding. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12348. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 574–591, 2020.

Crossref

[44]

Van Engelen,

J. E.

; Hoos,

H. H.

A survey on semi-supervised learning. Machine Learning Vol. 109, No. 2, 373–440, 2020.

Crossref Google Scholar

[45]

Laine,

; Aila,

Temporal ensembling for semi-supervised learning. In: Proceedings of the 5th International Conference on Learning Representations, 2017.

[46]

Tarvainen,

; Valpola,

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Proceedings of the Advances in Neural Information Processing Systems 30, 2017.

[47]

Sohn,

; Berthelot,

; Li,

C.-L.

; Zhang,

; Cubuk,

N. C. E. D.

; Kurakin,

; Zhang,

; Raffel,

FixMatch: Simplifying semi-supervised learning with consistency and confidence. In: Proceedings of the Advances in Neural Information Processing Systems 33, 2020.

[48]

Berthelot,

; Carlini,

; Goodfellow,

; Papernot,

; Oliver,

; Raffel,

MixMatch: A holistic approach to semi-supervised learning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No. 454, 5049–5059, 2019.

[49]

French,

; Laine,

; Aila,

T. M.

; Mackiewicz,

; Finlayson,

Semi-supervised semantic segmentation needs strong, varied perturbations. In: Proceedings of the 31st British Machine Vision Virtual Conference, 2020.

[50]

Wang,

K. P.

; Zhan,

; Zu,

; Wu,

; Zhou,

J. L.

; Zhou,

L. P.

; Wang,

Tripled-uncertainty guided mean teacher model for semi-supervised medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. Lecture Notes in Computer Science, Vol. 12902. Springer Cham, 450–460, 2021.

Crossref

[51]

Wang,

L. J.

; Li,

; Fang,

Few-shot learning of part-specific probability space for 3D shape segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4503–4512, 2020.

Crossref

[52]

Funkhouser,

; Kazhdan,

; Shilane,

; Min,

; Kiefer,

; Tal,

; Rusinkiewicz,

; Dobkin,

Modeling by example. ACM Transactions on Graphics Vol. 23, No. 3, 652–663, 2004.

Crossref Google Scholar

[53]

Chaudhuri,

; Kalogerakis,

; Guibas,

; Koltun,

Probabilistic reasoning for assembly-based 3D modeling. ACM Transactions on Graphics Vol. 30, No. 4, Article No. 35, 2011.

Crossref Google Scholar

[54]

Xie,

X. H.

; Xu,

; Mitra,

N. J.

, Cohen-Or,

, Gong,

W. Y.

; Su,

; Chen,

Sketch-to-design: Context-based part assembly. Computer Graphics Forum Vol. 32, No. 8, 233–245, 2013.

Crossref Google Scholar

[55]

Alhashim,

; Li,

H. H.

; Xu,

; Cao,

J. J.

; Ma,

; Zhang,

Topology-varying 3D shape creation via structural blending. ACM Transactions on Graphics Vol. 33, No. 4, Article No. 158, 2014.

Crossref Google Scholar

[56]

Xu,

; Zhang,

; Cohen-Or,

; Chen,

Fit and diverse: Set evolution for inspiring 3D shape galleries. ACM Transactions on Graphics Vol. 31, No. 4, Article No. 57, 2012.

Crossref Google Scholar

[57]

Zhu,

; Xu,

; Chaudhuri,

; Yi,

; Zhang,

SCORES: Shape composition with recursive substructure priors. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 211, 2018.

Crossref Google Scholar

[58]

Huang,

H. B.

; Kalogerakis,

; Marlin,

Analysis and synthesis of 3D shape families via deep-learned generative models of surfaces. Computer Graphics Forum Vol. 34, No. 5, 25–38, 2015.

Crossref Google Scholar

[59]

Wu,

R. D.

; Zhuang,

Y. X.

; Xu,

; Zhang,

; Chen,

B. Q.

PQ-NET: A generative part Seq2Seq network for 3D shapes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 826–835, 2020.

[60]

Mo,

K. C.

; Guerrero,

; Yi,

; Su,

; Wonka,

; Mitra,

N. J.

; Guibas,

L. J.

StructEdit: Learning structural shape variations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8856–8865, 2020.

[61]

Fu,

; Chen,

X. W.

; Su,

X. Y.

; Fu,

H. B.

Pose-inspired shape synthesis and functional hybrid. IEEE Transactions on Visualization and Computer Graphics Vol. 23, No. 12, 2574–2585, 2017.

Crossref Google Scholar

[62]

Zheng,

Y. Y.

; Cohen-Or,

; Mitra,

N. J.

Smart variations: Functional substructures for part compatibility. Computer Graphics Forum Vol. 32, No. 2pt2, 195–204, 2013.

Crossref Google Scholar

[63]

Guan,

; Liu,

; Yin,

; Hu,

; van Kaick,

; Zhang,

; Yumer,

; Carr,

; Mech,

; Zhang,

FAME: 3D shape generation via functionality-aware model evolution. IEEE Transactions on Visualization and Computer Graphics Vol. 28, No. 4, 1758–1772, 2022.

Crossref Google Scholar

[64]

Chen,

; Hu,

V. T.

; Gavves,

; Mensink,

; Mettes,

; Yang,

; Snoek,

C. G.

PointMixup: Augmentation for point clouds. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12348. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 330–345, 2020.

Crossref

[65]

Li,

R. H.

; Li,

X. Z.

; Heng,

P. A.

; Fu,

C. W.

PointAugment: An auto-augmentation framework for point cloud classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6377–6386, 2020.

[66]

Lee,

; Lee,

; Woo,

; Lee,

Regularization strategy for point cloud via rigidly mixed sample. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15895–15904, 2021.

Crossref

[67]

Zhang,

J. L.

; Chen,

L. J.

; Bo,

O. Y.

; Liu,

B. B.

; Zhu,

J. H.

; Chen,

Y. J.

; Meng,

; Wu,

PointCutMix: Regularization strategy for point cloud classification. Neurocomputing Vol. 505, 58–67, 2022.

Crossref Google Scholar

[68]

Wang,

P. S.

; Liu,

; Tong,

Deep octree-based CNNs with output-guided skip connections for 3D shape and scene completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 1074–1081, 2020.

Crossref

[69]

Abadi,

; Barham,

; Chen,

; Davis,

; Dean,

; Devin,

; Ghemawat,

; Irving,

; Isard,

; et al. TensorFlow: A system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, 265–283, 2016.

[70]

Mo,

K. C.

; Zhu,

S. L.

; Chang,

A. X.

; Yi,

; Tripathi,

; Guibas,

L. J.

; Su,

PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 909–918, 2019.

[71]

Dai,

; Chang,

A. X.

; Savva,

; Halber,

; Funkhouser,

; Nießner,

ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2432–2443, 2017.

Crossref

[72]

Li,

J. X.

; Chen,

B. M.

; Lee,

G. H.

SO-Net: Self-organizing network for point cloud analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9397–9406, 2018.

[73]

Zhao,

Y. H.

; Birdal,

; Deng,

H. W.

; Tombari,

3D point capsule networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1009–1018, 2019.

Crossref

[74]

Thabet,

; Alwassel,

; Ghanem,

Self-supervised learning of local features in 3D point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 4048–4052, 2020.

Crossref

[75]

Alliegro,

; Boscaini,

; Tommasi,

Joint supervised and self-supervised learning for 3D real world challenges. In: Proceedings of the 25th International Conference on Pattern Recognition, 6718–6725, 2020.

Crossref

[76]

Gadelha,

; RoyChowdhury,

; Sharma,

; Kalogerakis,

; Cao,

L. L.

; Learned-Miller,

; Wang,

; Maji,

Label-efficient learning on point clouds using approximate convex decompositions. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12355. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 473–491, 2020.

Crossref

[77]

Fellbaum,

WordNet: An Electronic Lexical Database. The MIT Press, 1998.

Crossref

Computational Visual Media

Volume 9 Issue 2,
June 2023

Pages 229-247

DOI: 10.1007/s41095-022-0281-9

Cite this article:

Sun C-Y, Yang Y-Q, Guo H-X, et al. Semi-supervised 3D shape segmentation with multilevel consistency and part substitution. Computational Visual Media, 2023, 9(2): 229-247. https://doi.org/10.1007/s41095-022-0281-9