Publications
Sort:
Open Access Issue
KnowER: Knowledge enhancement for efficient text-video retrieval
Intelligent and Converged Networks 2023, 4 (2): 93-105
Published: 30 June 2023
Abstract PDF (3.8 MB) Collect
Downloads:76

The widespread adoption of mobile Internet and the Internet of things (IoT) has led to a significant increase in the amount of video data. While video data are increasingly important, language and text remain the primary methods of interaction in everyday communication, text-based cross-modal retrieval has become a crucial demand in many applications. Most previous text-video retrieval works utilize implicit knowledge of pre-trained models such as contrastive language-image pre-training (CLIP) to boost retrieval performance. However, implicit knowledge only records the co-occurrence relationship existing in the data, and it cannot assist the model to understand specific words or scenes. Another type of out-of-domain knowledge—explicit knowledge—which is usually in the form of a knowledge graph, can play an auxiliary role in understanding the content of different modalities. Therefore, we study the application of external knowledge base in text-video retrieval model for the first time, and propose KnowER, a model based on knowledge enhancement for efficient text-video retrieval. The knowledge-enhanced model achieves state-of-the-art performance on three widely used text-video retrieval datasets, i.e., MSRVTT, DiDeMo, and MSVD.

Total 1