Regular Paper

Mixed Hierarchical Networks for Deep Entity Matching

Engineering Research Center of Learning-Based Intelligent System (Ministry of Education) Tianjin University of Technology, Tianjin 300384, China
School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300384, China
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China

A preliminary version of the paper was published in the Proceedings of DASFAA 2021.

Entity matching is a fundamental problem of data integration. It groups records according to underlying real-world entities. There is a growing trend of entity matching via deep learning techniques. We design mixed hierarchical deep neural networks (MHN) for entity matching, exploiting semantics from different abstract levels in the record internal hierarchy. A family of attention mechanisms is utilized in different periods of entity matching. Self-attention focuses on internal dependency, inter-attention targets at alignments, and multi-perspective weight attention is devoted to importance discrimination. Especially, hybrid soft token alignment is proposed to address corrupted data. Attribute order is for the first time considered in deep entity matching. Then, to reduce utilization of labeled training data, we propose an adversarial domain adaption approach (DA-MHN) to transfer matching knowledge between different entity matching tasks by maximizing classifier discrepancy. Finally, we conduct comprehensive experimental evaluations on 10 datasets (seven for MHN and three for DA-MHN), which illustrate our two proposed approaches’ superiorities. MHN apparently outperforms previous studies in accuracy, and also each component of MHN is tested. DA-MHN greatly surpasses existing studies in transferability.

jcst-36-4-822-Highlights.pdf (133.9 KB)



Journal of Computer Science and Technology
Pages 822-838
Cite this article:
Sun C-C, Shen D-R. Mixed Hierarchical Networks for Deep Entity Matching. Journal of Computer Science and Technology, 2021, 36(4): 822-838.






Received: 01 February 2021
Accepted: 12 July 2021
Published: 05 July 2021
©Institute of Computing Technology, Chinese Academy of Sciences 2021