We introduce a novel end-to-end deep-learning solution for rapidly estimating a dense spherical depth map of an indoor environment. Our input is a single equirectangular image registered with a sparse depth map, as provided by a variety of common capture setups. Depth is inferred by an efficient and lightweight single-branch network, which employs a dynamic gating system to process together dense visual data and sparse geometric data. We exploit the characteristics of typical man-made environments to efficiently compress multi-resolution features and find short- and long-range relations among scene parts. Furthermore, we introduce a new augmentation strategy to make the model robust to different types of sparsity, including those generated by various structured light sensors and LiDAR setups. The experimental results demonstrate that our method provides interactive performance and outperforms state-of-the-art solutions in computational efficiency, adaptivity to variable depth sparsity patterns, and prediction accuracy for challenging indoor data, even when trained solely on synthetic data without any fine tuning.
- Article type
- Year
- Co-author
We present a novel approach to automati-cally recover, from a small set of partially overlapping spherical images, an indoor structure representation in terms of a 3D floor plan registered with a set of 3D environment maps. We introduce several improvements over previous approaches based on color and spatial reasoning exploiting Manhattan world priors. Inparticular, we introduce a new method for geometric context extraction based on a 3D facet representation, which combines color distribution analysis of individual images with sparse multi-view clues. We also introduce an efficient method to combine the facets from different viewpoints in a single consistent model, taking into the reliability of the facet information. The resulting capture and reconstruction pipeline automatically generates 3D multi-room environments in cases where most previous approaches fail, e.g., in the presence of hidden corners and large clutter, without the need for additional dense 3D data or tools. We demonstrate the effectiveness and performance of our approach on different real-world indoor scenes. Our test data is available to allow further studies and comparisons.