AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis

School of Computer Science and Technology, Shandong University, Qingdao 266237, China
School of Information Science and Engineering, University of Jinan, Jinan 250022, China
School of Software, Shandong University, Jinan 250101, China
Show Author Information

Abstract

Indoor visual localization, i.e., 6 Degree-of-Freedom camera pose estimation for a query image with respect to a known scene, is gaining increased attention driven by rapid progress of applications such as robotics and augmented reality. However, drastic visual discrepancies between an onsite query image and prerecorded indoor images cast a significant challenge for visual localization. In this paper, based on the key observation of the constant existence of planar surfaces such as floors or walls in indoor scenes, we propose a novel system incorporating geometric information to address issues using only pixelated images. Through the system implementation, we contribute a hierarchical structure consisting of pre-scanned images and point cloud, as well as a distilled representation of the planar-element layout extracted from the original dataset. A view synthesis procedure is designed to generate synthetic images as complementary to that of a sparsely sampled dataset. Moreover, a global image descriptor based on the image statistic modality, called block mean, variance, and color (BMVC), was employed to speed up the candidate pose identification incorporated with a traditional convolutional neural network (CNN) descriptor. Experimental results on a popular benchmark demonstrate that the proposed method outperforms the state-of-the-art approaches in terms of visual localization validity and accuracy.

Electronic Supplementary Material

Download File(s)
jcst-36-3-494-Highlights.pdf (910.7 KB)

References

[1]

Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz S M, Szeliski R. Building rome in a day. Communications of the ACM, 2011, 54(10): 105-112. DOI: 10.1145/2001269.2001293.

[2]

Dai A, Nießner M, Zollhöfer M, Izadi S, Theobalt C. BundleFusion: Real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Transactions on Graphics, 2017, 36(4): Article No. 76a. DOI: 10.1145/3072959.3054739.

[3]

Mur-Artal R, Tardós J D. ORB-SLAM2: An open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262. DOI: 10.1109/TRO.2017.2705103.

[4]
Li Y, Snavely N, Huttenlocher D, Fua P. Worldwide pose estimation using 3D point clouds. In Proc. the 12th European Conference on Computer Vision, October 2012, pp.15-29. DOI: 10.1007/978-3-642-33718-5_2.
[5]
Zeisl B, Sattler T, Pollefeys M. Camera pose voting for large-scale image-based localization. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2704-2712. DOI: 10.1109/ICCV.2015.310.
[6]
Sattler T, Havlena M, Radenovic F, Schindler K, Pollefeys M. Hyperpoints and fine vocabularies for large-scale location recognition. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2102-2110. DOI: 10.1109/ICCV.2015.243.
[7]

Sattler T, Leibe B, Kobbelt L. Efficient & effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(9): 1744-1756. DOI: 10.1109/TPAMI.2016.2611662.

[8]
Arandjelović R, Zisserman A. All about VLAD. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.1578-1585. DOI: 10.1109/CVPR.2013.207.
[9]
Torii A, Arandjelović R, Sivic J, Okutomi M, Pajdla T. 24/7 place recognition by view synthesis. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1808-1817. DOI: 10.1109/CVPR.2015.7298790.
[10]
Sattler T, Havlena M, Schindler K, Pollefeys M. Large-scale location recognition and the geometric burstiness problem. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.1582-1590. DOI: 10.1109/CVPR.2016.175.
[11]
Arandjelović R, Zisserman A. DisLocation: Scalable descriptor distinctiveness for location recognition. In Proc. the 12th Asian Conference on Computer Vision, November 2014, pp.188-204. DOI: 10.1007/978-3-319-16817-3_13.
[12]
Taira H, Okutomi M, Sattler T, Cimpoi M, Pollefeys M, Sivic J, Pajdla T, Torii A. InLoc: Indoor visual localization with dense matching and view synthesis. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.7199-7209. DOI: 10.1109/CVPR.2018.00752.
[13]
Taira H, Rocco I, Sedlar J, Okutomi M, Sivic J, Pajdla T, Sattler T, Torii A. Is this the right place? Geometricsemantic pose verification for indoor visual localization. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27 – Nov. 2, 2019, pp.4372-4382. DOI: 10.1109/ICCV.2019.00447.
[14]
Kendall A, Grimes M, Cipolla R. PoseNet: A convolutional network for real-time 6-DoF camera relocalization. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2938-2946. DOI: 10.1109/ICCV.2015.336.
[15]
Balntas V, Li S, Prisacariu V. RelocNet: Continuous metric learning relocalisation using neural nets. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.782-799. DOI: 10.1007/978-3-030-01264-9_46.
[16]
Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A, Sattler T. D2-Net: A trainable CNN for joint description and detection of local features. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.8092-8101. DOI: 10.1109/CVPR.2019.00828.
[17]
Sattler T, Zhou Q, Pollefeys M, Leal-Taixé L. Understanding the limitations of CNN-based absolute camera pose regression. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.3297-3307. DOI: 10.1109/CVPR.2019.00342.
[18]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014. https://arxiv.org/abs/1409.1556, Jan. 2021.
[19]
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. DOI: 10.1109/CVPR.2016.90.
[20]
Sarlin P E, Cadena C, Siegwart R, Dymczyk M. From coarse to fine: Robust hierarchical localization at large scale. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.12708-12717. DOI: 10.1109/CVPR.2019.01300.
[21]
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.4510-4520. DOI: 10.1109/CVPR.2018.00474.
[22]

Arandjelović R, Gronat P, Torii A, Pajdla T, Sivic J. NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1437-1451. DOI: 10.1109/TPAMI.2017.2711011.

[23]
Zhang W, Kosecka J. Image based localization in urban environments. In Proc. the 3rd International Symposium on 3D Data Processing, Visualization, and Transmission, June 2006, pp.33-40. DOI: 10.1109/3DPVT.2006.80.
[24]

Maddern W, Pascoe G, Linegar C, Newman P. 1 year, 1000 km: The Oxford RobotCar dataset. The International Journal of Robotics Research, 2017, 36(1): 3-15. DOI: 10.1177/0278364916679498.

[25]
Sattler T, Weyand T, Leibe B, Kobbelt L. Image retrieval for image-based localization revisited. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 72. DOI: 10.5244/C.26.76.
[26]
Badino H, Huber D, Kanade T. Visual topometric localization. In Proc. the 2011 IEEE Intelligent Vehicles Symposium, June 2011, pp.794-799. DOI: 10.1109/IVS.2011.5940504.
[27]
Cavallari T, Golodetz S, Lord N A, Valentin J, Di Stefano L, Torr P H. On-the-fly adaptation of regression forests for online camera relocalisation. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.218-227. DOI: 10.1109/CVPR.2017.31.
[28]
Meng L, Chen J, Tung F, Little J J, Valentin J, De Silva C W. Backtracking regression forests for accurate camera relocalization. In Proc. the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, September 2017, pp.6886-6893. DOI: 10.1109/IROS.2017.8206611.
[29]
DeTone D, Malisiewicz T, Rabinovich A. SuperPoint: Self-supervised interest point detection and description. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, June 2018, pp.224-236. DOI: 10.1109/CVPRW.2018.00060.
[30]
Clark R, Wang S, Markham A, Trigoni N, Wen H. VidLoc: A deep spatio-temporal model for 6-DoF video-clip relocalization. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2652-2660. DOI: 10.1109/CVPR.2017.284.
[31]
Newcombe R A, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison A J, Kohi P, Shotton J, Hodges S, Fitzgibbon A. KinectFusion: Real-time dense surface mapping and tracking. In Proc. the 10th IEEE International Symposium on Mixed and Augmented Reality, October 2011, pp.127-136. DOI: 10.1109/ISMAR.2011.6092378.
[32]
Taguchi Y, Jian Y D, Ramalingam S, Feng C. Point-plane SLAM for hand-held 3D sensors. In Proc. the 2013 IEEE International Conference on Robotics and Automation, May 2013, pp.5182-5189. DOI: 10.1109/ICRA.2013.6631318.
[33]
Kim P, Coltin B, Kim H J. Linear RGB-D SLAM for planar environments. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.350-366. DOI: 10.1007/978-3-030-01225-0_21.
[34]
Shi T, Cui H, Song Z, Shen S. Dense semantic 3D map based long-term visual localization with hybrid features. arXiv: 2005.10766, 2020. https://arxiv.org/abs/2005.10766, Jan. 2021.
[35]

Fischler M A, Bolles R C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981, 24(6): 381-395. DOI: 10.1145/358669.358692.

[36]
Schönberger J L, Frahm J M. Structure-from-motion revisited. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4104-4113. DOI: 10.1109/CVPR.2016.445.
[37]

Radwan N, Valada A, Burgard W. Vlocnet++: Deep multi-task learning for semantic visual localization and odometry. IEEE Robotics and Automation Letters, 2018, 3(4): 4407-4414. DOI: 10.1109/LRA.2018.2869640.

[38]
Schönberger J L, Pollefeys M, Geiger A, Sattler T. Semantic visual localization. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.6896-6906. DOI: 10.1109/CVPR.2018.00721.
[39]

Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110. DOI: 10.1023/B:VISI.0000029664.99615.94.

[40]
Liu W, Li W, Huang Y, Peng J. Image retrieval by subspace-projected color and texture features. In Proc. the 2017 IEEE International Conference on Image Processing, September 2017, pp.2891-2895. DOI: 10.1109/ICIP.2017.8296811.
[41]
Su Q, Huang Y, Peng J. CoLDImage: Contrast and luminance distribution for content-based image retrieval. In Proc. the 2011 International Conference on Image Analysis and Signal Processing, October 2011, pp.143-146. DOI: 10.1109/IASP.2011.6109015.
[42]

Osada R, Funkhouser T, Chazelle B, Dobkin D. Shape distributions. ACM Transactions on Graphics, 2002, 21(4): 807-832. DOI: 10.1145/571647.571648.

[43]
Ghanem B, Thabet A, Carlos Niebles J, Caba Heilbron F. Robust Manhattan frame estimation from a single RGB-D image. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.3772-3780. DOI: 10.1109/CVPR.2015.7299001.
[44]
Feng C, Taguchi Y, Kamat V R. Fast plane extraction in organized point clouds using agglomerative hierarchical clustering. In Proc. the 2014 IEEE International Conference on Robotics and Automation, May 31–June 7, 2014, pp.6218-6225. DOI: 10.1109/ICRA.2014.6907776.
[45]
Chen D M, Baatz G, Köser K, Tsai S S, Vedantham R, Pylvänäinen T, Roimela K, Chen X, Bach J, Pollefeys M, Girod B, Grzeszczuk R. City-scale landmark identification on mobile devices. In Proc. the 2011 IEEE Conference on Computer Vision and Pattern Recognition, June 2011, pp.737-744. DOI: 10.1109/CVPR.2011.5995610.
[46]

Torii A, Sivic J, Okutomi M, Pajdla T. Visual place recognition with repetitive structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2346-2359. DOI: 10.1109/TPAMI.2015.2409868.

Journal of Computer Science and Technology
Pages 494-507
Cite this article:
Wang H-X, Peng J-L, Lu S-Y, et al. ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis. Journal of Computer Science and Technology, 2021, 36(3): 494-507. https://doi.org/10.1007/s11390-021-1373-1

475

Views

3

Crossref

1

Web of Science

3

Scopus

0

CSCD

Altmetrics

Received: 15 February 2021
Accepted: 26 April 2021
Published: 05 May 2021
©Institute of Computing Technology, Chinese Academy of Sciences 2021
Return