Sort:
Open Access Research Article Issue
Non-dominated sorting based multi-page photo collage
Computational Visual Media 2022, 8 (2): 199-212
Published: 06 December 2021
Abstract PDF (5.2 MB) Collect
Downloads:20

The development of social networking services (SNSs) revealed a surge in image sharing. The sharing mode of multi-page photo collage (MPC), which posts several image collages at a time, can often be observed on many social network platforms, which enables uploading images and arrangement in a logical order. This study focuses on the construction of MPC for an image collection and its formulation as an issue of joint optimization, which involves not only the arrangement in a single collage but also the arrangement among different collages. Novel balance-aware measurements, which merge graphic features and psy-chological achievements, are introduced. Non-dominated sorting genetic algorithm is adopted to optimize the MPC guided by the measurements. Experiments demonstrate that the proposed method can lead to diverse, visually pleasant, and logically clear MPC results, which are comparable to manually designed MPC results.

Open Access Review Article Issue
Transformers in computational visual media: A survey
Computational Visual Media 2022, 8 (1): 33-62
Published: 27 October 2021
Abstract PDF (5.2 MB) Collect
Downloads:73

Transformers, the dominant architecture for natural language processing, have also recently attracted much attention from computational visual media researchers due to their capacity for long-range representation and high performance. Transformers are sequence-to-sequence models, which use a self-attention mechanism rather than the RNN sequential structure. Thus, such models can be trained in parallel and can represent global information. This study comprehensively surveys recent visual transformer works. We categorize them according to task scenario: backbone design, high-level vision, low-level vision and generation, and multimodal learning. Their key ideas are also analyzed. Differing from previous surveys, we mainly focus on visual transformer methods in low-level vision and generation. The latest works on backbone design are also reviewed in detail. For ease of understanding, we precisely describe the main contributions of the latest works in the form of tables. As well as giving quantitative comparisons, we also present image results for low-level vision and generation tasks. Computational costs and source code links for various important works are also given in this survey to assist further development.

Open Access Research Article Issue
SiamCPN: Visual tracking with the Siamese center-prediction network
Computational Visual Media 2021, 7 (2): 253-265
Published: 05 April 2021
Abstract PDF (9.6 MB) Collect
Downloads:95

Object detection is widely used in objecttracking; anchor-free object tracking provides an end-to-end single-object-tracking approach. In thisstudy, we propose a new anchor-free network, the Siamese center-prediction network (SiamCPN). Given the presence of referenced object features in the initial frame, we directly predict the center point and size of the object in subsequent frames in a Siamese-structure network without the need for per-frame post-processing operations. Unlike other anchor-free tracking approaches that are based on semantic segmentation and achieve anchor-free tracking by pixel-level prediction, SiamCPN directly obtains all information required for tracking, greatly simplifying the model. A center-prediction sub-network is applied to multiple stages of the backbone to adaptively learn from the experience of different branches of the Siamese net. The model can accurately predict object location, implement appropriate corrections, and regress the size of the target bounding box. Compared to other leading Siamese networks, SiamCPN is simpler, faster, and more efficient as it uses fewer hyperparameters. Experiments demonstrate that our method outperforms other leading Siamese networks on GOT-10K and UAV123 benchmarks, and is comparable to other excellent trackers on LaSOT, VOT2016, and OTB-100 while improving inference speed 1.5 to 2 times.

Total 3