AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (3.7 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

CDCAT: A Multi-Language Cross-Document Entity and Event Coreference Annotation Tool

School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
Beijing Engineering Research Center for Cyberspace Data Analysis and Applications, Beijing 100083, China
Research Institute with Run Technologies Company, Ltd., Beijing 100192, China
School of Information Engineering, Xinjiang Institute of Engineering, Urumqi 830091, China
Show Author Information

Abstract

A tool for the manual annotation of cross-document entity and event coreferences that helps annotators to label mention coreference relations in text is essential for the annotation of coreference corpora. To the best of our knowledge, CROss-document Main Events and entities Recognition (CROMER) is the only open-source manual annotation tool available for cross-document entity and event coreferences. However, CROMER lacks multi-language support and extensibility. Moreover, to label cross-document mention coreference relations, CROMER requires the support of another intra-document coreference annotation tool known as Content Annotation Tool, which is now unavailable. To address these problems, we introduce Cross-Document Coreference Annotation Tool (CDCAT), a new multi-language open-source manual annotation tool for cross-document entity and event coreference, which can handle different input/output formats, preprocessing functions, languages, and annotation systems. Using this new tool, annotators can label a reference relation with only two mouse clicks. Best practice analyses reveal that annotators can reach an annotation speed of 0.025 coreference relations per second on a corpus with a coreference density of 0.076 coreference relations per word. As the first multi-language open-source cross-document entity and event coreference annotation tool, CDCAT can theoretically achieve higher annotation efficiency than CROMER.

References

[1]
S. Barhom, V. Shwartz, A. Eirew, M. Bugert, N. Reimers, and I. Dagan, Revisiting joint modeling of cross-document entity and event coreference resolution, in Proc. 57th Ann. Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 4179-4189.
[2]
H. J. Fan, Z. Y. Ma, H. Q. Li, D. S. Wang, and J. F. Liu, Enhanced answer selection in CQA using multi-dimensional features combination, Tsinghua Science and Technology, vol. 24, no. 3, pp. 346-359, 2019
[3]
Y. F. Gao, P. J. Li, I. King, and M. R. Lyu, Interconnected question generation with coreference alignment and conversation flow modeling, in Proc. 57th Ann. Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 4853-4862.
[4]
M. Liu, B. Lang, Z. P. Gu, and A. Zeeshan, Measuring similarity of academic articles with semantic profile and joint word embedding, Tsinghua Science and Technology, vol. 22, no. 6, pp. 619-632, 2017.
[5]
P. C. Ma, B. Jiang, Z. G. Lu, N. Li, and Z. W. Jiang, Cybersecurity named entity recognition using bidirectional long short-term memory with conditional random fields, Tsinghua Science and Technology, vol. 26, no. 3, pp. 259-265, 2021.
[6]
C. Walker, S. Strassel, J. Medero, and K. Maeda, ACE 2005 multilingual training corpus, https://catalog.ldc.upenn.edu/LDC2006T06, 2005.
[7]
S. D. Huang, S. Strassel, A. Mitchell, and Z. Y. Song, Shared resources for multilingual information extraction and challenges in named entity annotation, in Proc. 1st Int. Joint Conf. Natural Language Proc., Hainan, China, 2004, pp. 112-119.
[8]
N. Reimers and I. Gurevych, Event nugget detection, classification and coreference resolution using deep neural networks and gradient boosted decision trees, in Proc. 8th Text Analysis Conf., Gaithersburg, MD, USA, 2015.
[9]
J. Pustejovsky, J. M. Castano, R. Ingria, R. Sauri, R. J. Gaizauskas, A. Setzer, G. Katz, and D. R. Radev, Timeml: Robust specification of event and temporal expressions in text, in Proc. 5th Int. Workshop on Computational Semantics, Tilburg, Netherlands, 2003.
[10]
A. Cybulska and P. Vossen, Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution, in Proc. 9th Int. Conf. Language Resources and Evaluation, Reykjavik, Iceland, 2014, pp. 4545-4552.
[11]
C. A. Bejan and S. Harabagiu, Unsupervised event coreference resolution with rich linguistic features, in Proc. 48th Ann. Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 1412-1422.
[12]
H. Lee, M. Recasens, A. Chang, M. Surdeanu, and D. Jurafsky, Joint entity and event coreference resolution across documents, in Proc. 2012 Joint Conf. Empirical Methods in Natural Language Proc. Computational Natural Language Learning, Jeju Island, Korea, 2012, pp. 489-500.
[13]
C. Girardi, M. Speranza, R. Sprugnoli, and S. Tonelli, Cromer: A tool for cross-document event and entity coreference, in Proc. 9th Int. Conf. Language Resources and Evaluation, Reykjavik, Iceland, 2014, pp. 3204-3208.
[14]
V. B. Lenzi, G. Moretti, and R. Sprugnoli, Cat: the celct annotation tool, in Proc. 8th Int. Conf. Language Resources and Evaluation, Istanbul, Turkey, 2012, pp. 333-338.
[15]
P. Stenetorp, S. Pyysalo, G. Topic, T. Ohta, S. Ananiadou, and J. Tsujii, Brat: A web-based tool for nlp-assisted text annotation, in Proc. 13th Conf. European Chapter of the Association for Computational Linguistics, Avignon, France, 2012, pp. 102-107.
Tsinghua Science and Technology
Pages 589-598
Cite this article:
Xu Y, Xia B, Wan Y, et al. CDCAT: A Multi-Language Cross-Document Entity and Event Coreference Annotation Tool. Tsinghua Science and Technology, 2022, 27(3): 589-598. https://doi.org/10.26599/TST.2020.9010060

623

Views

78

Downloads

4

Crossref

4

Web of Science

4

Scopus

0

CSCD

Altmetrics

Received: 17 November 2020
Revised: 02 December 2020
Accepted: 17 December 2020
Published: 13 November 2021
© The author(s) 2022

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return