In the domain of point cloud registration, the coarse-to-fine feature matching paradigm has received significant attention due to its impressive performance. This paradigm involves a two-step process: first, the extraction of multi-level features, and subsequently, the propagation of correspondences from coarse to fine levels. However, this approach faces two notable limitations. Firstly, the use of the Dual Softmax operation may promote one-to-one correspondences between superpoints, inadvertently excluding valuable correspondences. Secondly, it is crucial to closely examine the overlapping areas between point clouds, as only correspondences within these regions decisively determine the actual transformation. Considering these issues, we propose OAAFormer to enhance correspondence quality. On the one hand, we introduce a soft matching mechanism to facilitate the propagation of potentially valuable correspondences from coarse to fine levels. On the other hand, we integrate an overlapping region detection module to minimize mismatches to the greatest extent possible. Furthermore, we introduce a region-wise attention module with linear complexity during the fine-level matching phase, designed to enhance the discriminative capabilities of the extracted features. Tests on the challenging 3DLoMatch benchmark demonstrate that our approach leads to a substantial increase of about 7% in the inlier ratio, as well as an enhancement of 2%–4% in registration recall. Finally, to accelerate the prediction process, we replace the Conventional Random Sample Consensus (RANSAC) algorithm with the selection of a limited yet representative set of high-confidence correspondences, resulting in a 100 times speedup while still maintaining comparable registration performance.
- Article type
- Year
- Co-author
A thin shell model refers to a surface or structure, where the object's thickness is considered negligible. In the context of 3D printing, thin shell models are characterized by having lightweight, hollow structures, and reduced material usage. Their versatility and visual appeal make them popular in various fields, such as cloth simulation, character skinning, and for thin-walled structures like leaves, paper, or metal sheets. Nevertheless, optimization of thin shell models without external support remains a challenge due to their minimal interior operational space. For the same reasons, hollowing methods are also unsuitable for this task. In fact, thin shell modulation methods are required to preserve the visual appearance of a two-sided surface which further constrain the problem space. In this paper, we introduce a new visual disparity metric tailored for shell models, integrating local details and global shape attributes in terms of visual perception. Our method modulates thin shell models using global deformations and local thickening while accounting for visual saliency, stability, and structural integrity. Thereby, thin shell models such as bas-reliefs, hollow shapes, and cloth can be stabilized to stand in arbitrary orientations, making them ideal for 3D printing.
Professional dance is characterized by high impulsiveness, elegance, and aesthetic beauty. In order to reach the desired professionalism, it requires years of long and exhausting practice, good physicalcondition, musicality, but also, a good understanding of choreography. Capturing dance motions and transferring them to digital avatars is commonly used in the film and entertainment industries. However, so far, access to high-quality dance data is very limited, mainly due to the many practical difficulties in capturing the movements of dancers, making it prohibitive for large-scale data acquisition. In this paper, we present a model that enhances the professionalism of amateur dance movements, allowing movement quality to be improved in both spatial and temporal domains. Our model consists of a dance-to-music alignment stage responsible for learning the optimal temporal alignment path between dance and music, and a dance-enhancement stage that injects features of professionalism in both spatial and temporal domains. To learn a homogeneous distribution and credible mapping between the heterogeneous professional and amateur datasets, we generate amateur data from professional dances taken from the AIST++dataset. We demonstrate the effectiveness of our method by comparing it with two baseline motion transfer methods via thorough qualitative visual controls, quantitative metrics, and a perceptual study. We also provide temporal and spatial module analysis to examine the mechanisms and necessity of key components of our framework.
Voronoi diagrams on triangulated surfaces based on the geodesic metric play a key role in many applications of computer graphics. Previous methods of constructing such Voronoi diagrams generally depended on having an exact geodesic metric. However, exact geodesic computation is time-consuming and has high memory usage, limiting wider application of geodesic Voronoi diagrams (GVDs). In order to overcome this issue, instead of using exact methods, we reformulate a graph method based on Steiner point insertion, as an effective way to obtain geodesic distances. Further, since a bisector comprises hyperbolic and line segments, we utilize Apollonius diagrams to encode complicated structures, enabling Voronoi diagrams to encode a medial-axis surface for a dense set of boundary samples. Based on these strategies, we present an approximation algorithm for efficient Voronoi diagram construction on triangulated surfaces. We also suggest a measure for evaluating similarity of our results to the exact GVD. Although our GVD results are constructed using approximate geodesic distances, we can get GVD results similar to exact results by inserting Steiner points on triangle edges. Experimental results on many 3D models indicate the improved speed and memory requirements compared to previous leading methods.
Indoor visual localization, i.e., 6 Degree-of-Freedom camera pose estimation for a query image with respect to a known scene, is gaining increased attention driven by rapid progress of applications such as robotics and augmented reality. However, drastic visual discrepancies between an onsite query image and prerecorded indoor images cast a significant challenge for visual localization. In this paper, based on the key observation of the constant existence of planar surfaces such as floors or walls in indoor scenes, we propose a novel system incorporating geometric information to address issues using only pixelated images. Through the system implementation, we contribute a hierarchical structure consisting of pre-scanned images and point cloud, as well as a distilled representation of the planar-element layout extracted from the original dataset. A view synthesis procedure is designed to generate synthetic images as complementary to that of a sparsely sampled dataset. Moreover, a global image descriptor based on the image statistic modality, called block mean, variance, and color (BMVC), was employed to speed up the candidate pose identification incorporated with a traditional convolutional neural network (CNN) descriptor. Experimental results on a popular benchmark demonstrate that the proposed method outperforms the state-of-the-art approaches in terms of visual localization validity and accuracy.