Abstract
We present a practical backend for stereovisual SLAM which can simultaneously discoverindividual rigid bodies and compute their motions in dynamic environments. While recent factor graph based state optimization algorithms have shown their ability to robustly solve SLAM problems by treating dynamic objects as outliers, their dynamic motions are rarely considered. In this paper, we exploit the consensus of 3D motions for landmarks extracted from the same rigid body for clustering, and to identify static and dynamic objects in a unified manner. Specifically, our algorithm builds a noise-aware motion affinity matrix from landmarks, and uses agglomerative clustering to distinguish rigid bodies. Using decoupled factor graph optimization to revise their shapes and trajectories, we obtain an iterative scheme to update both cluster assignments and motion estimation reciprocally. Evaluations on both synthetic scenes and KITTI demonstrate the capability of our approach, and further experiments considering online efficiency also show the effectiveness of our method for simultaneously tracking ego-motion and multiple objects.