AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

CAT: A Simple yet Effective Cross-Attention Transformer for One-Shot Object Detection

School of Computer Science, Northwestern Polytechnical University, Xi’an, 710000, China
National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Northwestern Polytechnical University, Xi'an, 710000, China
School of Computer Science, The University of Adelaide, Adelaide, SA 0115, Australia
Show Author Information

Abstract

Given a query patch from a novel class, one-shot object detection aims to detect all instances of this class in a target image through the semantic similarity comparison. However, due to the extremely limited guidance in the novel class as well as the unseen appearance difference between the query and target instances, it is difficult to appropriately exploit their semantic similarity and generalize well. To mitigate this problem, we present a universal Cross-Attention Transformer (CAT) module for accurate and efficient semantic similarity comparison in one-shot object detection. The proposed CAT utilizes the transformer mechanism to comprehensively capture bi-directional correspondence between any paired pixels from the query and the target image, which empowers us to sufficiently exploit their semantic characteristics for accurate similarity comparison. In addition, the proposed CAT enables feature dimensionality compression for inference speedup without performance loss. Extensive experiments on three object detection datasets MS-COCO, PASCAL VOC and FSOD under the one-shot setting demonstrate the effectiveness and efficiency of our model, e.g., it surpasses CoAE, a major baseline in this task, by 1.0% in average precision (AP) on MS-COCO and runs nearly 2.5 times faster.

Electronic Supplementary Material

Download File(s)
JCST-2106-11743-Highlights.pdf (300.8 KB)
Journal of Computer Science and Technology
Pages 460-471
Cite this article:
Lin W-D, Deng Y-Y, Gao Y, et al. CAT: A Simple yet Effective Cross-Attention Transformer for One-Shot Object Detection. Journal of Computer Science and Technology, 2024, 39(2): 460-471. https://doi.org/10.1007/s11390-024-1743-6

31

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 27 June 2021
Accepted: 18 January 2024
Published: 30 March 2024
© Institute of Computing Technology, Chinese Academy of Sciences 2024
Return