Sort:
Regular Paper Issue
6D Object Pose Estimation in Cluttered Scenes from RGB Images
Journal of Computer Science and Technology 2022, 37(3): 719-730
Published: 31 May 2022
Abstract Collect

We propose a feature-fusion network for pose estimation directly from RGB images without any depth information in this study. First, we introduce a two-stream architecture consisting of segmentation and regression streams. The segmentation stream processes the spatial embedding features and obtains the corresponding image crop. These features are further coupled with the image crop in the fusion network. Second, we use an efficient perspective-n-point (E-PnP) algorithm in the regression stream to extract robust spatial features between 3D and 2D keypoints. Finally, we perform iterative refinement with an end-to-end mechanism to improve the estimation performance. We conduct experiments on two public datasets of YCB-Video and the challenging Occluded-LineMOD. The results show that our method outperforms state-of-the-art approaches in both the speed and the accuracy.

Open Access Research Article Issue
Simple primitive recognition via hierarchical face clustering
Computational Visual Media 2020, 6(4): 431-443
Published: 09 November 2020
Abstract PDF (1,008.4 KB) Collect
Downloads:26

We present a simple yet efficient algorithmfor recognizing simple quadric primitives (plane, sphere, cylinder, cone) from triangular meshes. Our approach is an improved version of a previous hierarchical clustering algorithm, which performs pairwise clustering of trianglepatches from bottom to top. The key contributions of our approach include a strategy for priority and fidelity consideration of the detected primitives, and a scheme for boundary smoothness between adjacent clusters. Experimental results demonstrate that the proposed method produces qualitatively and quantitatively better results than representative state-of-the-art methods on a wide range of test data.

Open Access Issue
Real-Time Facial Pose Estimation and Tracking by Coarse-to-Fine Iterative Optimization
Tsinghua Science and Technology 2020, 25(5): 690-700
Published: 16 March 2020
Abstract PDF (14.6 MB) Collect
Downloads:65

We present a novel and efficient method for real-time multiple facial poses estimation and tracking in a single frame or video. First, we combine two standard convolutional neural network models for face detection and mean shape learning to generate initial estimations of alignment and pose. Then, we design a bi-objective optimization strategy to iteratively refine the obtained estimations. This strategy achieves faster speed and more accurate outputs. Finally, we further apply algebraic filtering processing, including Gaussian filter for background removal and extended Kalman filter for target prediction, to maintain real-time tracking superiority. Only general RGB photos or videos are required, which are captured by a commodity monocular camera without any priori or label. We demonstrate the advantages of our approach by comparing it with the most recent work in terms of performance and accuracy.

Total 3
1/11GOpage