A tool for the manual annotation of cross-document entity and event coreferences that helps annotators to label mention coreference relations in text is essential for the annotation of coreference corpora. To the best of our knowledge, CROss-document Main Events and entities Recognition (CROMER) is the only open-source manual annotation tool available for cross-document entity and event coreferences. However, CROMER lacks multi-language support and extensibility. Moreover, to label cross-document mention coreference relations, CROMER requires the support of another intra-document coreference annotation tool known as Content Annotation Tool, which is now unavailable. To address these problems, we introduce Cross-Document Coreference Annotation Tool (CDCAT), a new multi-language open-source manual annotation tool for cross-document entity and event coreference, which can handle different input/output formats, preprocessing functions, languages, and annotation systems. Using this new tool, annotators can label a reference relation with only two mouse clicks. Best practice analyses reveal that annotators can reach an annotation speed of 0.025 coreference relations per second on a corpus with a coreference density of 0.076 coreference relations per word. As the first multi-language open-source cross-document entity and event coreference annotation tool, CDCAT can theoretically achieve higher annotation efficiency than CROMER.
- Article type
- Year
- Co-author
The pervasiveness of the smart Internet of Things (IoTs) enables many electric sensors and devices to be connected and generates a large amount of dataflow. Compared with traditional big data, the streaming dataflow is faced with representative challenges, such as high speed, strong variability, rough continuity, and demanding timeliness, which pose severe tests of its efficient management. In this paper, we provide an overall review of IoT dataflow management. We first analyze the key challenges faced with IoT dataflow and initially overview the related techniques in dataflow management, spanning dataflow sensing, mining, control, security, privacy protection, etc. Then, we illustrate and compare representative tools or platforms for IoT dataflow management. In addition, promising application scenarios, such as smart cities, smart transportation, and smart manufacturing, are elaborated, which will provide significant guidance for further research. The management of IoT dataflow is also an important area, which merits in-depth discussions and further study.