Scholar - SciOpen

Video colorization is a challenging and highly ill-posed problem. Although recent years have witnessed remarkable progress in single image colorization, there is relatively less research effort on video colorization, and existing methods always suffer from severe flickering artifacts (temporal incon-sistency) or unsatisfactory colorization. We address this problem from a new perspective, by jointly considering colorization and temporal consistency in a unified framework. Specifically, we propose a novel temporally consistent video colorization (TCVC) framework. TCVC effectively propagates frame-level deep features in a bidirectional way to enhance the temporal consistency of colorization. Furthermore, TCVC introduces a self-regularization learning (SRL) scheme to minimize the differences in predictions obtained using different time steps. SRL does not require any ground-truth color videos for training and can further improve temporal consistency. Experiments demonstrate that our method can not only provide visually pleasing colorized video, but also with clearly better temporal consistency than state-of-the-art methods. A video demo is provided at https://www.youtube.com/watch?v=c7dczMs-olE, while code is available at https://github.com/lyh-18/TCVC-Temporally-Consistent-Video-Colorization.

Open Access Research Article Issue

Towards robustness and generalization of point cloud representation: A geometry coding method and a large-scale object-level dataset

Mingye Xu, Zhipeng Zhou, Yali Wang, Yu Qiao

Computational Visual Media 2024, 10(1): 27-43

Published: 30 November 2023

Abstract

PDF (4.2 MB) Collect Collected

Downloads：10

Robustness and generalization are two challenging problems for learning point cloud represen-tation. To tackle these problems, we first design a novel geometry coding model, which can effectively use an invariant eigengraph to group points with similar geometric information, even when such points are far from each other. We also introduce a large-scale point cloud dataset, PCNet184. It consists of 184 categories and 51,915 synthetic objects, which brings new challenges for point cloud classification, and provides a new benchmark to assess point cloud cross-domain generalization. Finally, we perform exten-sive experiments on point cloud classification, using ModelNet40, ScanObjectNN, and our PCNet184, and segmentation, using ShapeNetPart and S3DIS. Our method achieves comparable performance to state-of-the-art methods on these datasets, for both supervised and unsupervised learning. Code and our dataset are available at https://github.com/MingyeXu/PCNet184.

Open Access Research Article Issue

Joint 3D facial shape reconstruction and texture completion from a single image

Xiaoxing Zeng, Zhelun Wu, Xiaojiang Peng, Yu Qiao

Computational Visual Media 2022, 8(2): 239-256

Published: 06 December 2021

Abstract

PDF (7.2 MB) Collect Collected

Downloads：34

Recent years have witnessed significant progress in image-based 3D face reconstruction using deep convolutional neural networks. However, current reconstruction methods often perform improperly in self-occluded regions and can lead to inaccurate correspondences between a 2D input image and a 3D face template, hindering use in real applications. To address these problems, we propose a deep shape reconstruction and texture completion network, SRTC-Net, which jointly reconstructs 3D facial geometry and completes texture with correspondences from a single input face image. In SRTC-Net, we leverage the geometric cues from completed 3D texture to reconstruct detailed structures of 3D shapes. The SRTC-Net pipeline has three stages. The first introduces a correspondence network to identify pixel-wise correspondence between the input 2D image and a 3D template model, and transfers the input 2D image to a $U$ - $V$ texture map. Then we complete the invisible and occluded areas in the $U$ - $V$ texture map using an inpainting network. To get the 3D facial geometries, we predict coarse shape ( $U$ - $V$ position maps) from the segmented face from the correspondence network using a shape network, and then refine the 3D coarse shape by regressing the $U$ - $V$ displacement map from the completed $U$ - $V$ texture map in a pixel-to-pixel way. We examine our methods on 3D reconstruction tasks as well as face frontalization and pose invariant face recognition tasks, using both in-the-lab datasets (MICC, MultiPIE) and in-the-wild datasets (CFP). The qualitative and quantitative results demonstrate the effectiveness of our methods on inferring 3D facial geometry and complete texture; they outperform or are comparable to the state-of-the-art.

Total 3