Sort:
Open Access Research Article Just Accepted
See More, Know More: Richer Prior Knowledge for Novel Class Discovery
Computational Visual Media
Available online: 07 December 2024
Abstract PDF (4.5 MB) Collect
Downloads:6

Novel Class Discovery aims to discover novel categories in an unlabeled dataset by employing a model that is trained on a labeled dataset with different but semantically related categories. The challenge of this task is that the model needs to learn discriminative representations from seen categories that can accurately group unseen categories. Existing methods typically pre-train models on seen data only containing limited semantic categories, resulting in the learned representation less discriminative for varied unseen categories that may be encountered in the future. In this paper, we propose a novel Richer Prior K nowledge (RPK ) module to learn diverse and discriminative representation for future novel categories by exposing the model to a large number of synthetic visual categories. Our insight is that the more categories the model has seen during pre-training, the less biased the learned representation space will be to the base categories. To demonstrate the effectiveness of our approach, we conduct extensive experiments on a variety of datasets and settings, which validates the effectiveness of our proposed method. Additionally, our approach can be easily integrated into other methods and achieves superior performance.

Open Access Research Article Issue
Visual attention network
Computational Visual Media 2023, 9(4): 733-752
Published: 28 July 2023
Abstract PDF (5.4 MB) Collect
Downloads:57

While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision: (1) treating images as 1D sequences neglects their 2D structures; (2) the quadratic complexity is too expensive for high-resolution images; (3) it only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel linear attention named large kernel attention (LKA) to enable self-adaptive and long-range correlations in self-attention while avoiding its shortcomings. Furthermore, we present a neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple, VAN achieves comparable results with similar size convolutional neuralnetworks (CNNs) and vision transformers (ViTs) in various tasks, including image classification, object detection, semantic segmentation, panoptic segmentation,pose estimation, etc. For example, VAN-B6 achieves 87.8% accuracy on ImageNet benchmark, and sets new state-of-the-art performance (58.2% PQ) for panoptic segmentation. Besides, VAN-B2 surpasses Swin-T 4% mIoU (50.1% vs. 46.1%) for semantic segmentation on ADE20K benchmark, 2.6% AP (48.8% vs. 46.2%) for object detection on COCO dataset. It provides a novel method and a simple yet strong baseline for the community. The code is available at https://github.com/Visual-Attention-Network.

Open Access Research Article Issue
D2ANet: Difference-aware attention network for multi-level change detection from satellite imagery
Computational Visual Media 2023, 9(3): 563-579
Published: 08 March 2023
Abstract PDF (6.1 MB) Collect
Downloads:24

Recognizing dynamic variations on the ground, especially changes caused by various natural disasters, is critical for assessing the severity of thedamage and directing the disaster response. However, current workflows for disaster assessment usually require human analysts to observe and identify damaged buildings, which is labor-intensive and unsuitable for large-scale disaster areas. In this paper, we propose a difference-aware attention network (D2ANet) for simultaneous building localization and multi-level change detection from the dual-temporal satellite imagery. Considering the differences in different channels in the features of pre- and post-disaster images, we develop a dual-temporal aggregation module using paired features to excite change-sensitive channels of the features and learn the global change pattern. Since the nature of building damage caused by disasters is diverse in complex environments, we design a difference-attention module to exploit local correlations among the multi-level changes, which improves the ability to identify damage on different scales. Extensive experiments on the large-scale building damage assessment dataset xBD demonstrate that our approach provides new state-of-the-art results. Source code is publicly available at https://github.com/mj129/D2ANet.

Open Access Review Article Issue
Attention mechanisms in computer vision: A survey
Computational Visual Media 2022, 8(3): 331-368
Published: 15 March 2022
Abstract PDF (2.7 MB) Collect
Downloads:404

Humans can naturally and effectively find salient regions in complex scenes. Motivated by thisobservation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multi-modal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.

Open Access Research Article Issue
S4Net: Single stage salient-instance segmentation
Computational Visual Media 2020, 6(2): 191-204
Published: 10 June 2020
Abstract PDF (844.6 KB) Collect
Downloads:39

In this paper, we consider salient instance segmentation. As well as producing bounding boxes, our network also outputs high-quality instance-level segments as initial selections to indicate the regions of interest. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also the surrounding context, enabling us to distinguish instances in the same scope even with partial occlusion. Our network is end-to-end trainable and is fast (running at 40 fps for images with resolution 320×320). We evaluate our approach on a publicly available benchmark and show that it outperforms alternative solutions. We also provide a thorough analysis of our design choices to help readers better understand the function of each part of our network. Source code can be found at https://github.com/RuochenFan/S4Net.

Open Access Review Article Issue
Salient object detection: A survey
Computational Visual Media 2019, 5(2): 117-150
Published: 21 June 2019
Abstract PDF (11.1 MB) Collect
Downloads:87

Detecting and segmenting salient objects from natural scenes, often referred to as salient object detection, has attracted great interest in computer vision. While many models have been proposed and several applications have emerged, a deep understandingof achievements and issues remains lacking. We aim to provide a comprehensive review of recent progress in salient object detection and situate this field among other closely related areas such as generic scene segmentation, object proposal generation, and saliency for fixation prediction. Covering 228 publications, wesurvey i) roots, key concepts, and tasks, ii) core techniques and main modeling trends, and iii) datasets and evaluation metrics for salient object detection. We also discuss open problems such as evaluation metrics and dataset bias in model performance, and suggest future research directions.

Open Access Research Article Issue
BING: Binarized normed gradients for objectness estimation at 300fps
Computational Visual Media 2019, 5(1): 3-20
Published: 08 April 2019
Abstract PDF (21.8 MB) Collect
Downloads:40

Training a generic objectness measure to produce object proposals has recently become of significant interest. We observe that generic objects with well-defined closed boundaries can be detected by looking at the norm of gradients, with a suitable resizing of their corresponding image windows to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8×8 and use the norm of the gradients as a simple 64D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this feature, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g., add, bitwise shift, etc.). To improve localization quality of the proposals while maintaining efficiency, we propose a novel fast segmentation method and demonstrate its effectiveness for improving BING’s localization performance, when used in multi-thresholding straddling expansion (MTSE) post-processing. On the challenging PASCAL VOC2007 dataset, using 1000 proposals per image and intersection-over-union threshold of 0.5, our proposal method achieves a 95.6% object detection rate and 78.6% mean average best overlap in less than 0.005 second per image.

Open Access Research Article Issue
FLIC: Fast linear iterative clustering with active search
Computational Visual Media 2018, 4(4): 333-348
Published: 27 October 2018
Abstract PDF (38.5 MB) Collect
Downloads:29

In this paper, we reconsider the clustering problem for image over-segmentation from a new per-spective. We propose a novel search algorithm called "active search" which explicitly considers neighbor continuity. Based on this search method, we design a back-and-forth traversal strategy and a joint assignment and update step to speed up the algorithm. Compared to earlier methods, such as simple linear iterative clustering (SLIC) and its variants, which use fixed search regions and perform the assignment and the update steps separately, our novel scheme reduces the number of iterations required for convergence, and also provides better boundaries in the over-segmentation results. Extensive evaluation using the Berkeley segmentation benchmark verifies that our method outperforms competing methods under various evaluation metrics. In particular, our method is fastest, achieving approximately 30 fps for a 481×321 image on a single CPU core. To facilitate further research, our code is made publicly available.

Total 8