DP-Share: Privacy-Preserving Software Defect Prediction Model Sharing Through Differential Privacy

Xiang Chen; Dun Zhang; Zhan-Qi Cui; Qing Gu; Xiao-Lin Ju

doi:10.1007/s11390-019-1958-0

| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Regular Paper

DP-Share: Privacy-Preserving Software Defect Prediction Model Sharing Through Differential Privacy

Xiang Chen^{¹^,²^,³}, Dun Zhang^¹, Zhan-Qi Cui^{²^,⁴}, Qing Gu^², Xiao-Lin Ju^{¹^,²}

School of Information Science and Technology, Nantong University, Nantong 226019, China

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China

School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore

Computer School, Beijing Information Science and Technology University, Beijing 100101, China

Xiang Chen and Dun Zhang have contributed equally for this work and they are co-first authors.]]>

Show Author Information

Abstract

In current software defect prediction (SDP) research, most previous empirical studies only use datasets provided by PROMISE repository and this may cause a threat to the external validity of previous empirical results. Instead of SDP dataset sharing, SDP model sharing is a potential solution to alleviate this problem and can encourage researchers in the research community and practitioners in the industrial community to share more models. However, directly sharing models may result in privacy disclosure, such as model inversion attack. To the best of our knowledge, we are the first to apply differential privacy (DP) to privacy-preserving SDP model sharing and then propose a novel method DP-Share, since DP mechanisms can prevent this attack when the privacy budget is carefully selected. In particular, DP-Share first performs data preprocessing for the dataset, such as over-sampling for minority instances (i.e., defective modules) and conducting discretization for continuous features to optimize privacy budget allocation. Then, it uses a novel sampling strategy to create a set of training sets. Finally it constructs decision trees based on these training sets and these decision trees can form a random forest (i.e., model). The last phase of DP-Share uses Laplace and exponential mechanisms to satisfy the requirements of DP. In our empirical studies, we choose nine experimental subjects from real software projects. Then, we use AUC (area under ROC curve) as the performance measure and holdout as our model validation technique. After privacy and utility analysis, we find that DP-Share can achieve better performance than a baseline method DF-Enhance in most cases when using the same privacy budget. Moreover, we also provide guidelines to effectively use our proposed method. Our work attempts to fill the research gap in terms of differential privacy for SDP, which can encourage researchers and practitioners to share more SDP models and then effectively advance the state of the art of SDP.

Keywords

software defect prediction model sharing differential privacy cross project defect prediction empirical study

Electronic Supplementary Material

Download File(s)

jcst-34-5-1020-Highlights.pdf (361.9 KB)

jcst-34-5-1020_ESM.pdf (361.9 KB)

References

[1]

Hall T, Beecham S, Bowes D, Gray D, Counsell S. A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 2012, 38(6): 1276-1304.