Sort:
Open Access Issue
Decoupled Two-Phase Framework for Class-Incremental Few-Shot Named Entity Recognition
Tsinghua Science and Technology 2023, 28(5): 976-987
Published: 19 May 2023
Abstract PDF (9.2 MB) Collect
Downloads:57

Class-Incremental Few-Shot Named Entity Recognition (CIFNER) aims to identify entity categories that have appeared with only a few newly added (novel) class examples. However, existing class-incremental methods typically introduce new parameters to adapt to new classes and treat all information equally, resulting in poor generalization. Meanwhile, few-shot methods necessitate samples for all observed classes, making them difficult to transfer into a class-incremental setting. Thus, a decoupled two-phase framework method for the CIFNER task is proposed to address the above issues. The whole task is converted to two separate tasks named Entity Span Detection (ESD) and Entity Class Discrimination (ECD) that leverage parameter-cloning and label-fusion to learn different levels of knowledge separately, such as class-generic knowledge and class-specific knowledge. Moreover, different variants, such as the Conditional Random Field-based (CRF-based), word-pair-based methods in ESD module, and add-based, Natural Language Inference-based (NLI-based) and prompt-based methods in ECD module, are investigated to demonstrate the generalizability of the decoupled framework. Extensive experiments on the three Named Entity Recognition (NER) datasets reveal that our method achieves the state-of-the-art performance in the CIFNER setting.

Regular Paper Issue
Isolate Sets Based Parallel Louvain Method for Community Detection
Journal of Computer Science and Technology 2023, 38(2): 373-390
Published: 30 March 2023
Abstract Collect

Community detection is a vital task in many fields, such as social networks and financial analysis, to name a few. The Louvain method, the main workhorse of community detection, is a popular heuristic method. To apply it to large-scale graph networks, researchers have proposed several parallel Louvain methods (PLMs), which suffer from two challenges: the latency in the information synchronization, and the community swap. To tackle these two challenges, we propose an isolate sets based parallel Louvain method (IPLM) and a fusion IPLM with the hashtables based Louvain method (FIPLM), which are based on a novel graph partition algorithm. Our graph partition algorithm divides the graph network into subgraphs called isolate sets, in which the vertices are relatively decoupled from others. We first describe the concepts and properties of the isolate set. Second we propose an algorithm to divide the graph network into isolate sets, which enjoys the same computation complexity as the breadth-first search. Third, we propose IPLM, which can efficiently calculate and update vertices information in parallel without latency or community swap. Finally, we achieve further acceleration by FIPLM, which maintains a high quality of community detection with a faster speedup than IPLM. Our two methods are for shared-memory architecture, and we implement our methods on an 8-core PC; the experiments show that IPLM achieves a maximum speedup of 4.62x and outputs higher modularity (maximum 4.76%) than the serial Louvain method on 14 of 18 datasets. Moreover, FIPLM achieves a maximum speedup of 7.26x.

Open Access Issue
Efficient Knowledge Graph Embedding Training Framework with Multiple GPUs
Tsinghua Science and Technology 2023, 28(1): 167-175
Published: 21 July 2022
Abstract PDF (8 MB) Collect
Downloads:42

When training a large-scale knowledge graph embedding (KGE) model with multiple graphics processing units (GPUs), the partition-based method is necessary for parallel training. However, existing partition-based training methods suffer from low GPU utilization and high input/output (IO) overhead between the memory and disk. For a high IO overhead between the disk and memory problem, we optimized the twice partitioning with fine-grained GPU scheduling to reduce the IO overhead between the CPU memory and disk. For low GPU utilization caused by the GPU load imbalance problem, we proposed balanced partitioning and dynamic scheduling methods to accelerate the training speed in different cases. With the above methods, we proposed fine-grained partitioning KGE, an efficient KGE training framework with multiple GPUs. We conducted experiments on some benchmarks of the knowledge graph, and the results show that our method achieves speedup compared to existing framework on the training of KGE.

Total 3