ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis

Hui-Xuan Wang; Jing-Liang Peng; Shi-Yi Lu; Xin Cao; Xue-Ying Qin; Chang-He Tu

doi:10.1007/s11390-021-1373-1

| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Regular Paper

ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis

Hui-Xuan Wang^¹, Jing-Liang Peng^², Shi-Yi Lu^³, Xin Cao^³, Xue-Ying Qin^³, Chang-He Tu^¹()

School of Computer Science and Technology, Shandong University, Qingdao 266237, China

School of Information Science and Engineering, University of Jinan, Jinan 250022, China

School of Software, Shandong University, Jinan 250101, China

Show Author Information

Abstract

Indoor visual localization, i.e., 6 Degree-of-Freedom camera pose estimation for a query image with respect to a known scene, is gaining increased attention driven by rapid progress of applications such as robotics and augmented reality. However, drastic visual discrepancies between an onsite query image and prerecorded indoor images cast a significant challenge for visual localization. In this paper, based on the key observation of the constant existence of planar surfaces such as floors or walls in indoor scenes, we propose a novel system incorporating geometric information to address issues using only pixelated images. Through the system implementation, we contribute a hierarchical structure consisting of pre-scanned images and point cloud, as well as a distilled representation of the planar-element layout extracted from the original dataset. A view synthesis procedure is designed to generate synthetic images as complementary to that of a sparsely sampled dataset. Moreover, a global image descriptor based on the image statistic modality, called block mean, variance, and color (BMVC), was employed to speed up the candidate pose identification incorporated with a traditional convolutional neural network (CNN) descriptor. Experimental results on a popular benchmark demonstrate that the proposed method outperforms the state-of-the-art approaches in terms of visual localization validity and accuracy.

Keywords

visual localization planar surface statistic information view synthesis

Electronic Supplementary Material

Download File(s)

jcst-36-3-494-Highlights.pdf (910.7 KB)

References

[1]

Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz S M, Szeliski R. Building rome in a day. Communications of the ACM, 2011, 54(10): 105-112. DOI: 10.1145/2001269.2001293.

Crossref Google Scholar

[2]

Dai A, Nießner M, Zollhöfer M, Izadi S, Theobalt C. BundleFusion: Real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Transactions on Graphics, 2017, 36(4): Article No. 76a. DOI: 10.1145/3072959.3054739.

Crossref Google Scholar

[3]

Mur-Artal R, Tardós J D. ORB-SLAM2: An open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262. DOI: 10.1109/TRO.2017.2705103.

Crossref Google Scholar

[4]

Li Y, Snavely N, Huttenlocher D, Fua P. Worldwide pose estimation using 3D point clouds. In Proc. the 12th European Conference on Computer Vision, October 2012, pp.15-29. DOI: 10.1007/978-3-642-33718-5_2.