Scholar - SciOpen

Prevalent use of motion capture (MoCap) produces large volumes of data and MoCap data retrieval becomes crucial for efficient data reuse. MoCap clips may not be neatly segmented and labeled, increasing the difficulty of retrieval. In order to effectively retrieve such data, we propose an elastic content-based retrieval scheme via unsupervised posture encoding and strided temporal alignment (PESTA) in this work. It retrieves similarities at the sub-sequence level, achieves robustness against singular frames and enables control of tradeoff between precision and efficiency. It firstly learns a dictionary of encoded postures utilizing unsupervised adversarial autoencoder techniques and, based on which, compactly symbolizes any MoCap sequence. Secondly, it conducts strided temporal alignment to align a query sequence to repository sequences to retrieve the best-matching sub-sequences from the repository. Further, it extends to find matches for multiple sub-queries in a long query at sharply promoted efficiency and minutely sacrificed precision. Outstanding performance of the proposed scheme is well demonstrated by experiments on two public MoCap datasets and one MoCap dataset captured by ourselves.

Regular Paper Issue

ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis

Hui-Xuan Wang, Jing-Liang Peng, Shi-Yi Lu, Xin Cao, Xue-Ying Qin, Chang-He Tu

Journal of Computer Science and Technology 2021, 36(3): 494-507

Published: 05 May 2021

Abstract Collect Collected

Indoor visual localization, i.e., 6 Degree-of-Freedom camera pose estimation for a query image with respect to a known scene, is gaining increased attention driven by rapid progress of applications such as robotics and augmented reality. However, drastic visual discrepancies between an onsite query image and prerecorded indoor images cast a significant challenge for visual localization. In this paper, based on the key observation of the constant existence of planar surfaces such as floors or walls in indoor scenes, we propose a novel system incorporating geometric information to address issues using only pixelated images. Through the system implementation, we contribute a hierarchical structure consisting of pre-scanned images and point cloud, as well as a distilled representation of the planar-element layout extracted from the original dataset. A view synthesis procedure is designed to generate synthetic images as complementary to that of a sparsely sampled dataset. Moreover, a global image descriptor based on the image statistic modality, called block mean, variance, and color (BMVC), was employed to speed up the candidate pose identification incorporated with a traditional convolutional neural network (CNN) descriptor. Experimental results on a popular benchmark demonstrate that the proposed method outperforms the state-of-the-art approaches in terms of visual localization validity and accuracy.

Total 2

<1/11>GOpage