Sort:
Survey Issue
Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning Techniques
Journal of Computer Science and Technology 2024, 39 (2): 346-368
Published: 30 March 2024
Abstract Collect

Data partitioning techniques are pivotal for optimal data placement across storage devices, thereby enhancing resource utilization and overall system throughput. However, the design of effective partition schemes faces multiple challenges, including considerations of the cluster environment, storage device characteristics, optimization objectives, and the balance between partition quality and computational efficiency. Furthermore, dynamic environments necessitate robust partition detection mechanisms. This paper presents a comprehensive survey structured around partition deployment environments, outlining the distinguishing features and applicability of various partitioning strategies while delving into how these challenges are addressed. We discuss partitioning features pertaining to database schema, table data, workload, and runtime metrics. We then delve into the partition generation process, segmenting it into initialization and optimization stages. A comparative analysis of partition generation and update algorithms is provided, emphasizing their suitability for different scenarios and optimization objectives. Additionally, we illustrate the applications of partitioning in prevalent database products and suggest potential future research directions and solutions. This survey aims to foster the implementation, deployment, and updating of high-quality partitions for specific system scenarios.

Regular Paper Issue
Hadamard Encoding Based Frequent Itemset Mining under Local Differential Privacy
Journal of Computer Science and Technology 2023, 38 (6): 1403-1422
Published: 15 November 2023
Abstract Collect

Local differential privacy (LDP) approaches to collecting sensitive information for frequent itemset mining (FIM) can reliably guarantee privacy. Most current approaches to FIM under LDP add “padding and sampling” steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items. The current state-of-the-art approach, namely set-value itemset mining (SVSM), must balance variance and bias to achieve accurate results. Thus, an unbiased FIM approach with lower variance is highly promising. To narrow this gap, we propose an Item-Level LDP frequency oracle approach, named the Integrated-with-Hadamard-Transform-Based Frequency Oracle (IHFO). For the first time, Hadamard encoding is introduced to a set of values to encode all items into a fixed vector, and perturbation can be subsequently applied to the vector. An FIM approach, called optimized united itemset mining (O-UISM), is proposed to combine the padding-and-sampling-based frequency oracle (PSFO) and the IHFO into a framework for acquiring accurate frequent itemsets with their frequencies. Finally, we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.

Regular Paper Issue
Partial Label Learning via Conditional-Label-Aware Disambiguation
Journal of Computer Science and Technology 2021, 36 (3): 590-605
Published: 05 May 2021
Abstract Collect

Partial label learning is a weakly supervised learning framework in which each instance is associated with multiple candidate labels, among which only one is the ground-truth label. This paper proposes a unified formulation that employs proper label constraints for training models while simultaneously performing pseudo-labeling. Unlike existing partial label learning approaches that only leverage similarities in the feature space without utilizing label constraints, our pseudo-labeling process leverages similarities and differences in the feature space using the same candidate label constraints and then disambiguates noise labels. Extensive experiments on artificial and real-world partial label datasets show that our approach significantly outperforms state-of-the-art counterparts on classification prediction.

Total 3