Entity matching is a fundamental problem of data integration. It groups records according to underlying real-world entities. There is a growing trend of entity matching via deep learning techniques. We design mixed hierarchical deep neural networks (MHN) for entity matching, exploiting semantics from different abstract levels in the record internal hierarchy. A family of attention mechanisms is utilized in different periods of entity matching. Self-attention focuses on internal dependency, inter-attention targets at alignments, and multi-perspective weight attention is devoted to importance discrimination. Especially, hybrid soft token alignment is proposed to address corrupted data. Attribute order is for the first time considered in deep entity matching. Then, to reduce utilization of labeled training data, we propose an adversarial domain adaption approach (DA-MHN) to transfer matching knowledge between different entity matching tasks by maximizing classifier discrepancy. Finally, we conduct comprehensive experimental evaluations on 10 datasets (seven for MHN and three for DA-MHN), which illustrate our two proposed approaches’ superiorities. MHN apparently outperforms previous studies in accuracy, and also each component of MHN is tested. DA-MHN greatly surpasses existing studies in transferability.
Publications
- Article type
- Year
- Co-author
Article type
Year
Regular Paper
Issue
Journal of Computer Science and Technology 2021, 36(4): 822-838
Published: 05 July 2021
Total 1