Scholar - SciOpen

The objective of knowledge graph completion is to comprehend the structure and inherent relationships of domain knowledge, thereby providing a valuable foundation for knowledge reasoning and analysis. However, existing methods for knowledge graph completion face challenges. For instance, rule-based completion methods exhibit high accuracy and interpretability, but encounter difficulties when handling large knowledge graphs. In contrast, embedding-based completion methods demonstrate strong scalability and efficiency, but also have limited utilisation of domain knowledge. In response to the aforementioned issues, we propose a method of pre-training and inference for knowledge graph completion based on integrated rules. The approach combines rule mining and reasoning to generate precise candidate facts. Subsequently, a pre-trained language model is fine-tuned and probabilistic structural loss is incorporated to embed the knowledge graph. This enables the language model to capture more deep semantic information while the loss function reconstructs the structure of the knowledge graph. This enables the language model to capture more deep semantic information while the loss function reconstructs the structure of the knowledge graph. Extensive tests using various publicly accessible datasets have indicated that the suggested model performs better than current techniques in tackling knowledge graph completion problems.

Open Access Issue

Graph Deep Active Learning Framework for Data Deduplication

Huan Cao, Shengdong Du, Jie Hu, Yan Yang, Shi-Jinn Horng, Tianrui Li

Big Data Mining and Analytics 2024, 7(3): 753-764

Published: 28 August 2024

Abstract

PDF (2.9 MB) Collect Collected

Downloads：44

With the advent of the era of big data, an increasing amount of duplicate data are expressed in different forms. In order to reduce redundant data storage and improve data quality, data deduplication technology has never become more significant than nowadays. It is usually necessary to connect multiple data tables and identify different records pointing to the same entity, especially in the case of multi-source data deduplication. Active learning trains the model by selecting the data items with the maximum information divergence and reduces the data to be annotated, which has unique advantages in dealing with big data annotations. However, most of the current active learning methods only employ classical entity matching and are rarely applied to data deduplication tasks. To fill this research gap, we propose a novel graph deep active learning framework for data deduplication, which is based on similarity algorithms combined with the bidirectional encoder representations from transformers (BERT) model to extract the deep similarity features of multi-source data records, and first introduce the graph active learning strategy to build a clean graph to filter the data that needs to be labeled, which is used to delete the duplicate data that retain the most information. Experimental results on real-world datasets demonstrate that the proposed method outperforms state-of-the-art active learning models on data deduplication tasks.

Total 2

<1/11>GOpage