Towards Effective Author Name Disambiguation by Hybrid Attention

Qian Zhou; Wei Chen; Peng-Peng Zhao; An Liu; Jia-Jie Xu; Jian-Feng Qu; Lei Zhao

doi:10.1007/s11390-023-2070-z

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Regular Paper

Towards Effective Author Name Disambiguation by Hybrid Attention

Qian Zhou, Wei Chen(

), Peng-Peng Zhao, An Liu, Jia-Jie Xu, Jian-Feng Qu, Lei Zhao(

)

School of Computer Science and Technology, Soochow University, Suzhou 215006, China

Show Author Information

Abstract

Author name disambiguation (AND) is a central task in academic search, which has received more attention recently accompanied by the increase of authors and academic publications. To tackle the AND problem, existing studies have proposed various approaches based on different types of information, such as raw document features (e.g., co-authors, titles, and keywords), the fusion feature (e.g., a hybrid publication embedding based on multiple raw document features), the local structural information (e.g., a publication's neighborhood information on a graph), and the global structural information (e.g., interactive information between a node and others on a graph). However, there has been no work taking all the above-mentioned information into account and taking full advantage of the contributions of each raw document feature for the AND problem so far. To fill the gap, we propose a novel framework named EAND (Towards Effective Author Name Disambiguation by Hybrid Attention). Specifically, we design a novel feature extraction model, which consists of three hybrid attention mechanism layers, to extract key information from the global structural information and the local structural information that are generated from six similarity graphs constructed based on different similarity coefficients, raw document features, and the fusion feature. Each hybrid attention mechanism layer contains three key modules: a local structural perception, a global structural perception, and a feature extractor. Additionally, the mean absolute error function in the joint loss function is used to introduce the structural information loss of the vector space. Experimental results on two real-world datasets demonstrate that EAND achieves superior performance, outperforming state-of-the-art methods by at least +2.74% in terms of the micro-F1 score and +3.31% in terms of the macro-F1 score.

Keywords

author name disambiguation multiple-feature information hybrid attention pruning strategy structural information loss of vector space

Electronic Supplementary Material

Download File(s)

JCST-2112-12070-Highlights.pdf (159.9 KB)

References

[1]

Gupta S, Duhan N, Bansal P. An approach for focused crawler to harvest digital academic documents in online digital libraries. International Journal of Information Retrieval Research, 2019, 9(3): 23–47. DOI: 10.4018/IJIRR.2019070103.

Crossref Google Scholar

[2]

Chikazawa Y, Katsurai M, Ohmukai I. Multilingual author matching across different academic databases: A case study on KAKEN, DBLP, and PubMed. Scientometrics, 2021, 126(3): 2311–2327. DOI: 10.1007/s11192-020-03861-3.

Crossref Google Scholar

[3]

Tang J, Zhang J, Yao L M, Li J Z, Zhang L, Su Z. ArnetMiner: Extraction and mining of academic social networks. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2008, pp.990–998. DOI: 10.1145/1401890.1402008.

Crossref

[4]

Ferreira A A, Gonçalves M A, Laender A H F. Automatic Disambiguation of Author Names in Bibliographic Repositories. Springer, 2020. DOI: 10.1007/978-3-031-02322-4.

Crossref

[5]

Martín-Martín A, Thelwall M, Orduna-Malea E, López-Cózar E D. Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations' COCI: A multidisciplinary comparison of coverage via citations. Scientometrics, 2021, 126(1): 871–906. DOI: 10.1007/s11192-020-03690-4.

Crossref Google Scholar

[6]

Yin X X, Han J W, Yu P S. Object distinction: Distinguishing objects with identical names. In Proc. the 23rd International Conference on Data Engineering, Apr. 2007, pp.1242–1246. DOI: 10.1109/ICDE.2007.368983.

Crossref

[7]

Li X, Morie P, Roth D. Identification and tracing of ambiguous names: Discriminative and generative approaches. In Proc. the 19th National Conference on Artificial Intelligence, the 16th Conference on Innovative Applications of Artificial Intelligence, Jul. 2004, pp.419–424.

Crossref

[8]

Pooja K M, Mondal S, Chandra J. A graph combination with edge pruning-based approach for author name disambiguation. Journal of the Association for Information Science and Technology, 2020, 71(1): 69–83. DOI: 10.1002/ asi.24212.

Crossref Google Scholar

[9]

Ma Y Y, Wu Y L, Lu C Q. A graph-based author name disambiguation method and analysis via information theory. Entropy, 2020, 22(4): 416. DOI: 10.3390/e22040416.

Crossref Google Scholar

[10]

Zhang L Z, Ban Z J. Author name disambiguation based on rule and graph model. In Proc. the 9th CCF International Conference on Natural Language Processing and Chinese Computing, Oct. 2020, pp.617–628. DOI: 10.1007/978-3-030-60450-9_49.

Crossref

[11]

Zhang Y T, Zhang F J, Yao P R, Tang J. Name disambiguation in AMiner: Clustering, maintenance, and human in the loop. In Proc. the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Aug. 2018, pp.1002–1011. DOI: 10.1145/3219819.3219859.

Crossref

[12]

Kim K, Rohatgi S, Giles C L. Hybrid deep pairwise classification for author name disambiguation. In Proc. the 28th ACM International Conference on Information and Knowledge Management, Nov. 2019, pp.2369–2372. DOI: 10.1145/3357384.3358153.

Crossref

[13]

Jhawar K, Sanyal D K, Chattopadhyay S, Bhowmick P K, Das P P. Author name disambiguation in PubMed using ensemble-based classification algorithms. In Proc. the 2020 ACM/IEEE Joint Conference on Digital Libraries, Aug. 2020, pp.469–470. DOI: 10.1145/3383583.3398568.

Crossref

[14]

Han H, Zha H, Giles C L. Name disambiguation in author citations using a K-way spectral clustering method. In Proc. the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, Jun. 2005, pp.334–343. DOI: 10.1145/1065385.1065462.

Crossref

[15]

Louppe G, Al-Natsheh H T, Susik M, Maguire E J. Ethnicity sensitive author disambiguation using semi-supervised learning. In Proc. the 7th International Conference on Knowledge Engineering and Semantic Web, Sept. 2016, pp.272–287. DOI: 10.1007/978-3-319-45880-9_21.

Crossref

[16]

Zhang B C, Hasan M A. Name disambiguation in anonymized graphs using network embedding. In Proc. the 2017 ACM on Conference on Information and Knowledge Management, Nov. 2017, pp.1239–1248. DOI: 10.1145/3132847.3132873.

Crossref

[17]

Wang H W, Wang R J, Wen C, Li S H, Jia Y T, Zhang W N, Wang X B. Author name disambiguation on heterogeneous information network with adversarial representation learning. In Proc. the 34th AAAI Conference on Artificial Intelligence, Feb. 2020, pp.238–245. DOI: 10.1609/aaai.v34i01.5356.

Crossref

[18]

Sun Q Y, Peng H, Li J X, Wang S Z, Dong X Y, Zhao L X, Yu P S, He L F. Pairwise learning for name disambiguation in large-scale heterogeneous academic networks. In Proc. the 2020 IEEE Int. Conf. Data Mining, Nov. 2020, pp.511–520. DOI: 10.1109/ICDM50108.2020.00060.

Crossref

[19]

Zhou Q, Chen W, Wang W Q, Xu J J, Zhao L. Multiple features driven author name disambiguation. In Proc. the 2021 IEEE Int. Conf. Web Services, Sept. 2021, pp.506–515. DOI: 10.1109/ICWS53863.2021.00071.

Crossref

[20]

Santana A F, Gonçalves M A, Laender A H F, Ferreira A A. On the combination of domain-specific heuristics for author name disambiguation: The nearest cluster method. International Journal on Digital Libraries, 2015, 16(3): 229–246. DOI: 10.1007/s00799-015-0158-y.

Crossref Google Scholar

[21]

Kim J, Owen-Smith J. ORCID-linked labeled data for evaluating author name disambiguation at scale. Scientometrics, 2021, 126(3): 2057–2083. DOI: 10.1007/s11192-020-03826-6.

Crossref Google Scholar

[22]

GodoiT A, da S Torres R, Carvalho A M B R, Gonçalves M A, Ferreira A A, Fan W G, Fox E A. A relevance feedback approach for the author name disambiguation problem. In Proc. the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, Jul. 2013, pp.209–218. DOI: 10.1145/2467696.2467709.

Crossref

[23]

Xiao Z Y, Zhang Y T, Chen B, Liu X Z, Tang J. A framework for constructing a huge name disambiguation dataset: Algorithms, visualization and human collaboration. arXiv: 2007.02086, 2020. https://arxiv.org/abs/2007.02086, Jun. 2024.

[24]

Perozzi B, Al-Rfou R, Skiena S. DeepWalk: Online learning of social representations. In Proc. the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2014, pp.701–710. DOI: 10.1145/2623330.2623732.

Crossref

[25]

He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: 10.1109/CVPR.2016.90.

Crossref

[26]

Chen B, Zhang J, Tang J, Cai L F, Wang Z Y, Zhao S, Chen H, Li C P. CONNA: Addressing name disambiguation on the fly. IEEE Trans. Knowledge and Data Engineering, 2022, 34(7): 3139–3152. DOI: 10.1109/TKDE.2020.3021256.

Crossref Google Scholar

[27]

Cota R G, Ferreira A A, Nascimento C, Gonçalves M A, Laender A H F. An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 2010, 61(9): 1853–1870. DOI: 10.1002/asi.21363.

Crossref Google Scholar

[28]

Han H, Giles C L, Zha H Y, Li C, Tsioutsiouliklis K. Two supervised learning approaches for name disambiguation in author citations. In Proc. the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, Jun. 2004, pp.296–305. DOI: 10.1145/996350.996419.

Crossref

[29]

Yoshida M, Ikeda M, Ono S, Sato I, Nakagawa H. Person name disambiguation by bootstrapping. In Proc. the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2010, pp.10–17. DOI: 10.1145/1835449.1835454.

Crossref

[30]

Müller M C. Semantic author name disambiguation with word embeddings. In Proc. the 21st International Conference on Theory and Practice of Digital Libraries, Sept. 2017, pp.300–311. DOI: 10.1007/978-3-319-67008-9_24.

Crossref

[31]

Fan X M, Wang J Y, Pu X, Zhou L Z, Lv B. On graph-based name disambiguation. ACM Journal of Data and Information Quality, 2011, 2(2): Article No. 10. DOI: 10.1145/1891879.1891883.

Crossref Google Scholar

[32]

Tang J, Fong A C M, Wang B, Zhang J. A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowledge and Data Engineering, 2012, 24(6): 975–987. DOI: 10.1109/TKDE.2011.13.

Crossref Google Scholar

Journal of Computer Science and Technology

Volume 39 Issue 4,
August 2024

Pages 929-950

DOI: 10.1007/s11390-023-2070-z

Cite this article:

Zhou Q, Chen W, Zhao P-P, et al. Towards Effective Author Name Disambiguation by Hybrid Attention. Journal of Computer Science and Technology, 2024, 39(4): 929-950. https://doi.org/10.1007/s11390-023-2070-z

Views

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 06 December 2021

Accepted: 13 August 2023

Published: 20 September 2024