Irregular boundaries in image stitching naturally occur due to freely moving cameras. To deal with this problem, existing methods focus on optimizing mesh warping to make boundaries regular using the traditional explicit solution. However, previous methods always depend on hand-crafted features (e.g., keypoints and line segments). Thus, failures often happen in overlapping regions without distinctive features. In this paper, we address this problem by proposing RecStitchNet, a reasonable and effective network for image stitching with rectangular boundaries. Considering that both stitching and imposing rectangularity are non-trivial tasks in the learning-based framework, we propose a three-step progressive learning based strategy, which not only simplifies this task, but gradually achieves a good balance between stitching and imposing rectangularity. In the first step, we perform initial stitching by a pre-trained state-of-the-art image stitching model, to produce initially warped stitching results without considering the boundary constraint. Then, we use a regression network with a comprehensive objective regarding mesh, perception, and shape to further encourage the stitched meshes to have rectangular boundaries with high content fidelity. Finally, we propose an unsupervised instance-wise optimization strategy to refine the stitched meshes iteratively, which can effectively improve the stitching results in terms of feature alignment, as well as boundary and structure preservation. Due to the lack of stitching datasets and the difficulty of label generation, we propose to generate a stitching dataset with rectangular stitched images as pseudo-ground-truth labels, and the performance upper bound induced from the it can be broken by our unsupervised refinement. Qualitative and quantitative results and evaluations demonstrate the advantages of our method over the state-of-the-art.
- Article type
- Year
- Co-author
3D morphable models (3DMMs) are generative models for face shape and appearance. Recent works impose face recognition constraints on 3DMM shape parameters so that the face shapes of the same person remain consistent. However, theshape parameters of traditional 3DMMs satisfy the multivariate Gaussian distribution. In contrast, the identity embeddings meet the hypersphere distribution, and this conflict makes it challenging for face reconstruction models to preserve the faithfulness and the shape consistency simultaneously. In other words, recognition loss and reconstruction loss can not decrease jointly due to their conflict distribution. To address this issue, we propose the Sphere Face Model (SFM), a novel 3DMM for monocular face reconstruction, preserving both shape fidelity and identity consistency. The core of our SFM is the basis matrix which can be used to reconstruct 3D face shapes, and the basic matrix is learned by adopting a two-stage training approach where 3D and 2D training data are used in the first and second stages, respectively. We design a novel loss to resolve the distribution mismatch, enforcing that the shape parameters have the hyperspherical distribution. Our model accepts 2Dand 3D data for constructing the sphere face models. Extensive experiments show that SFM has high representation ability and clustering performance in its shape parameter space. Moreover, it produces high-fidelity face shapes consistently in challenging conditions in monocular face reconstruction. The code will be released at https://github.com/a686432/SIR
There is a steadily growing range of applications that can benefit from facial reconstruction techniques, leading to an increasing demand for reconstruction of high-quality 3D face models. While it is an important expressive part of the human face, the nose has received less attention than other expressive regions in the face reconstruction literature. When applying existing reconstruction methods to facial images, the reconstructed nose models are often inconsistent with the desired shape and expression. In this paper, we propose a coarse-to-fine 3D nose reconstruction and correction pipeline to build a nosemodel from a single image, where 3D and 2D nose curve correspondences are adaptively updated and refined. We first correct the reconstruction result coarsely using constraints of 3D-2D sparse landmark correspondences, and then heuristically update a dense 3D-2D curve correspondence based on the coarsely corrected result. A final refinement step is performed to correct the shape based on the updated 3D-2D dense curve constraints. Experimental results show the advantages of our method for 3D nose reconstruction over existing methods.
Researchers have achieved great success in dealing with 2D images using deep learning. In recent years, 3D computer vision and geometry deep learning have gained ever more attention. Many advanced techniques for 3D shapes have been proposed for different applications. Unlike 2D images, which can be uniformly represented by a regular grid of pixels, 3D shapes have various representations, such as depth images, multi-view images, voxels, point clouds, meshes, implicit surfaces, etc. The performance achieved in different applications largely depends on the representa-tion used, and there is no unique representation that works well for all applications. Therefore, in this survey, we review recent developments in deep learning for 3D geometry from a representation perspective, summarizing the advantages and disadvantages of different representations for different applications. We also present existing datasets in these representations and further discuss future research directions.
Virtual reality (VR) offers an artificial, com-puter generated simulation of a real life environment. It originated in the 1960s and has evolved to provide increasing immersion, interactivity, imagination, and intelligence. Because deep learning systems are able to represent and compose information at various levels in a deep hierarchical fashion, they can build very powerful models which leverage large quantities of visual media data. Intelligence of VR methods and applications has been significantly boosted by the recent developmentsin deep learning techniques. VR content creationand exploration relates to image and video analysis, synthesis and editing, so deep learning methods such as fully convolutional networks and general adversarial networks are widely employed, designed specifically to handle panoramic images and video and virtual 3D scenes. This article surveys recent research that uses such deep learning methods for VR content creation and exploration. It considers the problems involved, and discusses possible future directions in this active and emerging research area.