Deep panoramic depth prediction and completion for indoor scenes

Giovanni Pintore; Eva Almansa; Armando Sanchez; Giorgio Vassena; Enrico Gobbetti

doi:10.1007/s41095-023-0358-0

| Sign up

PDF (6.5 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Research Article | Open Access

Deep panoramic depth prediction and completion for indoor scenes

Giovanni Pintore^{¹^,^*}(), Eva Almansa^{¹^,^*}(), Armando Sanchez^², Giorgio Vassena^{²^,³}, Enrico Gobbetti^¹()

Visual and Data-intensive Computing, CRS4, Cagliari 09134, Italy

Gexcel srl, Elmas (CA) 09097, Italy

Department of Civil, Environment, Architectural Engineering, and Mathematics (DICATAM), Università degli Studi di Brescia (UNIBS), Brescia (BS) 25123, Italy

* Giovanni Pintore and Eva Almansa contributed equally to this work.

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

We introduce a novel end-to-end deep-learning solution for rapidly estimating a dense spherical depth map of an indoor environment. Our input is a single equirectangular image registered with a sparse depth map, as provided by a variety of common capture setups. Depth is inferred by an efficient and lightweight single-branch network, which employs a dynamic gating system to process together dense visual data and sparse geometric data. We exploit the characteristics of typical man-made environments to efficiently compress multi-resolution features and find short- and long-range relations among scene parts. Furthermore, we introduce a new augmentation strategy to make the model robust to different types of sparsity, including those generated by various structured light sensors and LiDAR setups. The experimental results demonstrate that our method provides interactive performance and outperforms state-of-the-art solutions in computational efficiency, adaptivity to variable depth sparsity patterns, and prediction accuracy for challenging indoor data, even when trained solely on synthetic data without any fine tuning.

Keywords

machine learning image processing and computer vision vision and scene understanding 3D stereo scene analysis

References

[1]

Zollhöfer, M.; Stotko, P.; Görlitz, A.; Theobalt, C.; Nießner, M.; Klein, R.; Kolb, A. State of the art on 3D reconstruction with RGB-D cameras. Computer Graphics Forum Vol. 37, No. 2, 625–652, 2018.

Crossref Google Scholar

[2]

Pintore, G.; Mura, C.; Ganovelli, F.; Fuentes-Perez, L.; Pajarola, R.; Gobbetti, E. State-of-the-art in automatic 3D reconstruction of structured indoor environments. Computer Graphics Forum Vol. 39, No. 2, 667–699, 2020.

Crossref Google Scholar

[3]

Mertan, A.; Duff, D. J.; Unal, G. Single image depth estimation: An overview. Digital Signal Processing Vol. 123, 103441, 2022.

Crossref Google Scholar

[4]

Ming, Y.; Meng, X. Y.; Fan, C. X.; Yu, H. Deep learning for monocular depth estimation: A review. Neurocomputing Vol. 438, 14–33, 2021.

Crossref Google Scholar

[5]

Jokela, T.; Ojala, J.; Väänänen, K. How people use 360-degree cameras. In: Proceedings of the 18th International Conference on Mobile and Ubiquitous Multimedia, 1–10, 2019.