| Sign up

PDF (4.5 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Research Article | Open Access

ClusterSLAM: A SLAM backend for simultaneous rigid body clustering and motion estimation

Jiahui Huang^¹, Sheng Yang^², Zishuo Zhao^¹, Yu-Kun Lai^³, Shi-Min Hu^¹()

1BNRist, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

2Alibaba A.I. Labs, Hangzhou 311121, China

3School of Computer Science and Informatics, Cardiff University, Cardiff, CF24 3AA, UK

Show Author Information

Abstract

We present a practical backend for stereovisual SLAM which can simultaneously discoverindividual rigid bodies and compute their motions in dynamic environments. While recent factor graph based state optimization algorithms have shown their ability to robustly solve SLAM problems by treating dynamic objects as outliers, their dynamic motions are rarely considered. In this paper, we exploit the consensus of 3D motions for landmarks extracted from the same rigid body for clustering, and to identify static and dynamic objects in a unified manner. Specifically, our algorithm builds a noise-aware motion affinity matrix from landmarks, and uses agglomerative clustering to distinguish rigid bodies. Using decoupled factor graph optimization to revise their shapes and trajectories, we obtain an iterative scheme to update both cluster assignments and motion estimation reciprocally. Evaluations on both synthetic scenes and KITTI demonstrate the capability of our approach, and further experiments considering online efficiency also show the effectiveness of our method for simultaneously tracking ego-motion and multiple objects.

Keywords

dynamic SLAM motion segmentation scene perception

References

[1]

Agarwal,

P.

; Tipaldi,

G. D.

; Spinello,

L.

; Stachniss,

C.

; Burgard,

W.

Robust map optimization using dynamic covariance scaling. In: Proceedings of the IEEE International Conference on Robotics and Automation, 62-69, 2013.

[2]

Carlone,

L.

; Censi,

A.

; Dellaert,

F.

Selecting good measurements via

ℓ 1

relaxation: A convex approach for robust estimation over graphs. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2667-2674, 2014.

[3]

Kim,

D. H.

; Kim,

J. H.

Effective background model-based RGB-D dense visual odometry in a dynamic environment. IEEE Transactions on Robotics Vol. 32, No. 6, 1565-1573, 2016.

Crossref Google Scholar

[4]

Bescos,

B.

; Facil,

J. M.

; Civera,

J.

; Neira,

J.

DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robotics and Automation Letters Vol. 3, No. 4, 4076-4083, 2018.

Crossref Google Scholar

[5]

Rünz,

M.

; Agapito,

L.

Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In: Proceedings of the IEEE International Conference on Robotics and Automation, 4471-4478, 2017.

[6]

Runz,

M.

; Buffier,

M.

; Agapito,

L.

MaskFusion: Real-time recognition, tracking and reconstruction of multiple moving objects. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, 10-20, 2018.

[7]

Barsan,

I. A.

; Liu,

P.

; Pollefeys,

M.

; Geiger,

A.

Robust dense mapping for large-scale dynamic environments. In: Proceedings of the IEEE International Conference on Robotics and Automation, 7510-7517, 2018.

[8]

Xu,

B.

; Li,

W.

; Tzoumanikas,

D.

; Bloesch,

M.

; Davison,

A.

; Leutenegger,

S.

; MID-fusion: Octree-based object-level multi-instance dynamic SLAM. In: Proceedings of the IEEE International Conference on Robotics and Automation, 5231-5237, 2019.

[9]

Paull,

L.

; Huang,

G.

; Seto,

M.

; Leonard,

J. J.

Communication-constrained multi-AUV cooperative SLAM. In: Proceedings of the IEEE InternationalConference on Robotics and Automation, 509-516, 2015.

[10]

Li,

P. L.

; Qin,

T.

; Shen,

S. J.

Stereo vision-based semantic 3D object and ego-motion tracking for autonomous driving. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11206. Ferrari,

V.

; Hebert,

M.

; Sminchisescu,

C.

; Weiss,

Y.

Eds. Springer Cham, 664-679, 2018.

[11]

Jaimez,

M.

; Kerl,

C.

; Gonzalez-Jimenez,

J.

; Cremers,

D.

Fast odometry and scene flow from RGB-D cameras based on geometric clustering. In: Proceedings of the IEEE International Conference on Robotics and Automation, 3992-3999, 2017.

[12]

He,

K.

; Gkioxari,

G.

; Dollár,

P.

; Girshick,

R.

Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2961-2969, 2017.

[13]

Chen,

L. C.

; Papandreou,

G.

; Kokkinos,

I.

; Murphy,

K.

; Yuille,

A. L.

DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834-848, 2018.

Crossref Google Scholar

[14]

Lenz,

P.

; Ziegler,

J.

; Geiger,

A.

; Roser,

M.

Sparse scene flow segmentation for moving object detection in urban environments. In: Proceedings of the IEEE Intelligent Vehicles Symposium, 926-932, 2011.

[15]

Huang,

J.

; Yang,

S.

; Zhao,

Z.

; Lai,

Y.-K.

; Hu,

S.-M.

Clusterslam: A slam backend for simultaneous rigid body clustering and motion estimation. In: Proceedings of the IEEE International Conference on Computer Vision, 5875-5884, 2019.

[16]

Geiger,

A.

; Lenz,

P.

; Stiller,

C.

; Urtasun,

R.

Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research Vol. 32, No. 11, 1231-1237, 2013.

Crossref Google Scholar

[17]

Alcantarilla,

P. F.

; Yebes,

J. J.

; Almazán,

J.

; Bergasa,

L. M.

On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments. In: Proceedings of the IEEE International Conference on Robotics and Automation, 1290-1297, 2012.

[18]

Mur-Artal,

R.

; Tardos,

J. D.

ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Transactions on Robotics Vol. 33, No. 5, 1255-1262, 2017.

Crossref Google Scholar

[19]

Kundu,

A.

; Krishna,

K. M.

; Jawahar,

C.

Realtime multibody visual SLAM with a smoothly moving monocular camera. In: Proceedings of the IEEE International Conference on Computer Vision, 2080-2087, 2011.

[20]

Judd,

K. M.

; Gammell,

J. D.

; Newman,

P.

Multimotion visual odometry (MVO): Simultaneous estimation of camera and third-party motions. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 3949-3956, 2018.

[21]

Dinesh Reddy,

N.

; Vo,

M.

; Narasimhan,

S. G.

CarFusion: Combining point tracking and part detection for dynamic 3D reconstruction of vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1906-1915, 2018.

[22]

Strecke,

M.

; Stuckler,

J.

Em-fusion: Dynamic object-level slam with probabilistic data association. In: Proceedings of the IEEE International Conference on Computer Vision, 5865-5874, 2019.

[23]

Saputra,

M. R. U.

; Markham,

A.

; Trigoni,

N.

Visual SLAM and structure from motion in dynamic environments. ACM Computing Surveys Vol. 51, No. 2, 1-36, 2018.

Crossref Google Scholar

[24]

Costeira,

J. P.

; Kanade,

T.

A multibody factorizationmethod for independently moving objects. International Journal of Computer Vision Vol. 29, No. 3, 159-179, 1998.

Crossref Google Scholar

[25]

Li,

T.

; Kallem,

V.

; Singaraju,

D.

; Vidal,

R.

Projective factorization of multiple rigid-body motions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-6, 2007.

[26]

Fischler,

M. A.

; Bolles,

R. C.

Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM Vol. 24, No. 6, 381-395, 1981.

Crossref Google Scholar

[27]

Azartash,

H.

; Lee,

K.

; Nguyen,

T. Q.

Visual odometry for RGB-D cameras for dynamic scenes. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1280-1284, 2014.

[28]

Xu,

X.

; Cheong,

L.F.

; Li,

Z.

Motion segmentation by exploiting complementary geometric models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2859-2867, 2018.

[29]

Vidal,

R.

; Ma,

Y.

; Soatto,

S.

; Sastry,

S.

Two-viewmultibody structure from motion. International Journal of Computer Vision Vol. 68, No. 1, 7-25, 2006.

Crossref Google Scholar

[30]

Vidal,

R.

; Hartley,

R.

Three-view multibody structure from motion. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 30, No. 2, 214-227, 2008.

Crossref Google Scholar

[31]

Ilg,

E.

; Mayer,

N.

; Saikia,

T.

; Keuper,

M.

; Dosovitskiy,

A.

; Brox,

T.

FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2462-2470, 2017.

[32]

Xie,

Z.-F.

; Guo,

Y.-C.

; Zhang,

S.-H.

; Zhang,

W.-J.

; Ma,

L.-Z.

Multi-exposure motion estimation based on deep convolutional networks. Journal of Computer Science and Technology Vol. 33, No. 3, 487-501, 2018.

Crossref Google Scholar

[33]

Zhang,

C. C.

; Liu,

Z. L.

Prior-free dependent motion segmentation using Helmholtz-Hodge decompositionbased object-motion oriented map. Journal of Computer Science and Technology Vol. 32, No. 3, 520-535, 2017.

Crossref Google Scholar

[34]

Isack,

H.

; Boykov,

Y.

Energy-based geometric multi-model fitting. International Journal of Computer Vision Vol. 97, No. 2, 123-147, 2012.

Crossref Google Scholar

[35]

Fan,

R. C.

; Zhang,

F. L.

, Zhang,

M.

; Martin,

R. R.

Robust tracking-by-detection using a selection and completion mechanism. Computational Visual Media Vol. 3, No. 3, 285-294, 2017.

Crossref Google Scholar

[36]

Yuan,

G.

; Sun,

P. H.

; Zhao,

J.

; Li,

D. X.

; Wang,

C. W.

A review of moving object trajectory clustering algorithms. Artificial Intelligence Review Vol. 47, No. 1, 123-144, 2017.

Crossref Google Scholar

[37]

Guha,

S.

; Rastogi,

R.

; Shim,

K.

CURE: An efficient clustering algorithm for large databases. ACM SIGMOD Record Vol. 27, No. 2, 73-84, 1998.

Crossref Google Scholar

[38]

Sokal,

R. R.

A statistical method for evaluating systematic relationship. University of Kansas Science Bulletin Vol. 28, 1409-1438, 1958.

[39]

DeTone,

D.

; Malisiewicz,

T.

; Rabinovich,

A.

SuperPoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 337, 2018.

[40]

Hartley,

R.

; Zisserman,

A.

Multiple View Geometry in Computer Vision. Cambridge University Press, 2003.

[41]

Defays,

D.

An efficient algorithm for a complete link method. The Computer Journal Vol. 20, No. 4, 364-366, 1977.

Crossref Google Scholar

[42]

Nguyen,

N.

; Caruana,

R.

Consensus clusterings. In: Proceedings of the IEEE International Conference on Data Mining, 607-612, 2007.

[43]

Newcombe,

R. A.

; Izadi,

S.

; Hilliges,

O.

; Molyneaux,

D.

; Kim,

D.

; Davison,

A. J.

; Kohi,

P.

; Shotton,

J.

; Hodges,

S.

; Fitzgibbon,

A.

KinectFusion: Real-time dense surface mapping and tracking. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, 127-136, 2011.

[44]

Cao,

Y. P.

; Kobbelt,

L.

, Hu,

S. M.

Real-time high-accuracy three-dimensional reconstruction with consumer RGB-D cameras. ACM Transactions on Graphics Vol. 37, No. 5, Article No. 171, 2018.

Crossref Google Scholar

[45]

Song,

S.

; Yu,

F.

; Zeng,

A.

; Chang,

A. X.

; Savva,

M.

; Funkhouser,

T.

Semantic scene completion from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1746-1754, 2017.

[46]

Dosovitskiy,

A.

; Ros,

G.

; Codevilla,

F.

; Lopez,

A.

; Koltun,

V.

CARLA: An open urban driving simulator. In: Proceedings of the 1st Annual Conference on Robot Learning, 1-16, 2017.

[47]

Kümmerle,

R.

; Grisetti,

G.

; Strasdat,

H.

; Konolige,

K.

; Burgard,

W.

G

^{2}

o: A general framework for graph optimization. In: Proceedings of the IEEE International Conference on Robotics and Automation, 3607-3613, 2011.

[48]

Meilǎ

M.

Comparing clusterings by the variation of information. In: Learning Theory and Kernel Machines. Lecture Notes in Computer Science, Vol. 2777. Schölkopf,

B.

; Warmuth,

M.K.

Eds. Springer Berlin Heidelberg, 173-187, 2003.

[49]

Ravankar,

A.

; Ravankar,

A.

; Kobayashi,

Y.

; Hoshino,

Y.

; Peng,

C. C.

Path smoothing techniques in robot navigation: State-of-the-art, current and future challenges. Sensors Vol. 18, No. 9, 3170, 2018.

Crossref Google Scholar

[50]

Murali,

V.

; Chiu,

H.-P.

; Samarasekera,

S.

; Kumar,

R. T.

Utilizing semantic visual landmarks for precise vehicle navigation. In: Proceedings of the IEEE International Conference on Intelligent Transportation Systems, 1-8, 2017.

Computational Visual Media

Volume 7 Issue 1,
March 2021

Pages 87-101

DOI: 10.1007/s41095-020-0195-3

Cite this article:

Huang J, Yang S, Zhao Z, et al. ClusterSLAM: A SLAM backend for simultaneous rigid body clustering and motion estimation. Computational Visual Media, 2021, 7(1): 87-101. https://doi.org/10.1007/s41095-020-0195-3

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号