AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Hadamard Encoding Based Frequent Itemset Mining under Local Differential Privacy

Institute of Scientific and Technical Information of China, Beijing 100038, China
Key Laboratory of Data Engineering and Knowledge Engineering (Ministry of Education), School of Information, Renmin University of China, Beijing 100872, China
Show Author Information

Abstract

Local differential privacy (LDP) approaches to collecting sensitive information for frequent itemset mining (FIM) can reliably guarantee privacy. Most current approaches to FIM under LDP add “padding and sampling” steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items. The current state-of-the-art approach, namely set-value itemset mining (SVSM), must balance variance and bias to achieve accurate results. Thus, an unbiased FIM approach with lower variance is highly promising. To narrow this gap, we propose an Item-Level LDP frequency oracle approach, named the Integrated-with-Hadamard-Transform-Based Frequency Oracle (IHFO). For the first time, Hadamard encoding is introduced to a set of values to encode all items into a fixed vector, and perturbation can be subsequently applied to the vector. An FIM approach, called optimized united itemset mining (O-UISM), is proposed to combine the padding-and-sampling-based frequency oracle (PSFO) and the IHFO into a framework for acquiring accurate frequent itemsets with their frequencies. Finally, we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.

Electronic Supplementary Material

Download File(s)
JCST-2101-11346-Highlights.pdf (150.4 KB)

References

[1]

Li N H, Qardaji W, Su D, Cao J N. PrivBasis: Frequent itemset mining with differential privacy. Proceedings of the VLDB Endowment, 2012, 5(11): 1340–1351. DOI: 10.14778/2350229.2350251.

[2]

Xiong X Y, Chen F, Huang P Z, Tian M M, Hu X F, Chen B D, Qin J. Frequent itemsets mining with differential privacy over large-scale data. IEEE Access, 2018, 6: 28877–28889. DOI: 10.1109/ACCESS.2018.2839752.

[3]
Wang T H, Li N H, Jha S. Locally differentially private frequent itemset mining. In Proc. the 2018 IEEE Symposium on Security and Privacy (SP), May 2018, pp.127–143. DOI: 10.1109/SP.2018.00035.
[4]
Agrawal R, Srikant R. Fast algorithms for mining association rules. In Proc. the 20th International Conference on Very Large Data Bases, Sept. 1994, pp.487–499. DOI: 10.5555/645920.672836.
[5]
Adar E, Weld D S, Bershad B N, Gribble S S. Why we search: Visualizing and predicting user behavior. In Proc. the 16th International Conference on World Wide Web, May 2007, pp.161–170. DOI: 10.1145/1242572.1242595.
[6]
Brin S, Motwani R, Silverstein C. Beyond market baskets: Generalizing association rules to correlations. In Proc. the 1997 ACM SIGMOD International Conference on Management of Data, Jun. 1997, pp.265–276. DOI: 10.1145/253260.253327.
[7]
Dwork C. Differential privacy: A survey of results. In Proc. the 5th International Conference on Theory and Applications of Models of Computation, Apr. 2008. DOI: 10.1007/978-3-540-79228-4_1.
[8]
Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. In Proc. the 3rd Theory of Cryptography Conference, Mar. 2006, pp.265–284. DOI: 10.1007/11681878_14.
[9]
Wang S W, Huang L S, Nie Y W, Wang P Z, Xu H L, Yang W. PrivSet: Set-valued data analyses with locale differential privacy. In Proc. the 2018 IEEE Conference on Computer Communications, Apr. 2018, pp.1088–1096. DOI: 10.1109/INFOCOM.2018.8486234.
[10]

Su S, Xu S Z, Cheng X, Li Z Y, Yang F C. Differentially private frequent itemset mining via transaction splitting. IEEE Trans. Knowledge and Data Engineering, 2015, 27(7): 1875–1891. DOI: 10.1109/TKDE.2015.2399310.

[11]
Qin Z, Yang Y, Yu T, Khalil I, Xiao X K, Ren K. Heavy hitter estimation over set-valued data with local differential privacy. In Proc. the 2016 ACM SIGSAC Conference on Computer and Communications Security, Oct. 2016, pp.192–203. DOI: 10.1145/2976749.2978409.
[12]
Cormode G, Jha S, Kulkarni T, Li N H, Srivastava D, Wang T H. Privacy at scale: Local differential privacy in practice. In Proc. the 2018 International Conference on Management of Data, May 2018, pp.1655–1658. DOI: 10.1145/3183713.3197390.
[13]

Zeng C, Naughton J F, Cai J Y. On differentially private frequent itemset mining. Proceedings of the VLDB Endowment, 2012, 6(1): 25–36. DOI: 10.14778/2428536.2428 539.

[14]
Vadhan S. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, Lindell Y (ed.), Springer, 2017, pp.347–450. DOI: 10.1007/978-3-319-57048-8_7.
[15]

Li N H, Lyu M, Su D, Yang W N. Differential privacy: From theory to practice. Synthesis Lectures on Information Security, Privacy, & Trust, 2016, 8(4): 1–138. DOI: 10.1007/978-3-031-02350-7.

[16]
Papernot N, Song S, Mironov I, Raghunathan A, Talwar K, Erlingsson Ú. Scalable private learning with PATE. arXiv: 1802.08908, 2018. https://arxiv.org/abs/1802.08908, Dec. 2023.
[17]
Kulkarni T, Cormode G, Srivastava D. Marginal release under local differential privacy. arXiv: 1711.02952, 2017. https://arxiv.org/abs/1711.02952, Dec. 2023.
[18]
Nguyên T T, Xiao X K, Yang Y, Hui S C, Shin H, Shin J. Collecting and analyzing data from smart device users with local differential privacy. arXiv: 1606.05053, 2016. https://arxiv.org/abs/1606.05053, Dec. 2023.
[19]
Wang T H, Blocki J, Li N H, Jha S. Locally differentially private protocols for frequency estimation. In Proc. the 26th USENIX Conference on Security Symposium, Aug. 2017, pp.729–745. DOI: 10.5555/3241189.3241247.
[20]
Bassily R, Smith A. Local, private, efficient protocols for succinct histograms. In Proc. the 47th Annual ACM Symposium on Theory of Computing, Jun. 2015, pp.127–135. DOI: 10.1145/2746539.2746632.
[21]

Wang T H, Li N H, Jha S. Locally differentially private heavy hitter identification. IEEE Trans. Dependable and Secure Computing, 2021, 18(2): 982–993. DOI: 10.1109/TDSC.2019.2927695.

[22]
Duchi J C, Wainwright M J, Jordan M I. Local privacy and minimax bounds: Sharp rates for probability estimation. In Proc. the 26th International Conference on Neural Information Processing Systems, Dec. 2013, pp.1529–1537. DOI: 10.5555/2999611.2999782.
[23]
Ye Q Q, Hu H B, Meng X F, Zheng H D. PrivKV: Key-value data collection with local differential privacy. In Proc. the 2019 IEEE Symposium on Security and Privacy (SP), May 2019, pp.317–331. DOI: 10.1109/SP.2019.00018.
[24]
Gu X L, Li M, Cheng Y Q, Xiong L, Cao Y. PCKV: Locally differentially private correlated key-value data collection with optimized utility. In Proc. the 29th USENIX Conference on Security Symposium, Aug. 2020, pp.967–984. DOI: 10.5555/3489212.3489267.
[25]
Erlingsson Ú, Feldman V, Mironov I, Raghunathan A, Talwar K, Thakurta A. Amplification by shuffling: From local to central differential privacy via anonymity. In Proc. the 30th Annual ACM-SIAM Symposium on Discrete Algorithms, Jan. 2019, pp.2468–2479. DOI: 10.1137/1.9781611975482.151.
[26]

Duchi J C, Jordan M I, Wainwright M J. Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association, 2018, 113(521): 182–201. DOI: 10.1080/01621459.2017.1389735.

[27]
Wang N, Xiao X K, Yang Y, Zhao J, Hui S C, Shin H, Shin J, Yu G. Collecting and analyzing multidimensional data with local differential privacy. In Proc. the 35th International Conference on Data Engineering (ICDE), Apr. 2019, pp.638–649. DOI: 10.1109/ICDE.2019.00063.
[28]
Liu R X, Cao Y, Chen H, Guo R Y, Yoshikawa M. FLAME: Differentially private federated learning in the shuffle model. In Proc. the AAAI Conference on Artificial Intelligence, May 2021, pp.8688–8696. DOI: 10.1609/aaai.v35i10.17053.
[29]
Chen R, Li H R, Qin A K, Kasiviswanathan S P, Jin H. Private spatial data aggregation in the local setting. In Proc. the 32nd IEEE International Conference on Data Engineering (ICDE), May 2016, pp.289–300. DOI: 10.1109/ICDE.2016.7498248.
[30]

Gursoy M E, Tamersoy A, Truex S, Wei W Q, Liu L. Secure and utility-aware data collection with condensed local differential privacy. IEEE Trans. Dependable and Secure Computing, 2021, 18(5): 2365–2378. DOI: 10.1109/TDSC.2019.2949041.

[31]
Murakami T, Kawamoto Y. Utility-optimized local differential privacy mechanisms for distribution estimation. In Proc. the 28th USENIX Conference on Security Symposium, Aug. 2019, pp.1877–1894. DOI: 10.5555/3361338.3361468.
[32]
Gu X L, Li M, Xiong L, Cao Y. Providing input-discriminative protection for local differential privacy. In Proc. the 36th IEEE International Conference on Data Engineering (ICDE), Apr. 2020, pp.505–516. DOI: 10.1109/ICDE48307.2020.00050.
[33]
Han X, Wang M, Zhang X J, Meng X F. Differentially private top-k query over mapreduce. In Proc. the 4th International Workshop on Cloud Data Management, Oct. 2012, pp.25–32. DOI: 10.1145/2390021.2390027.
[34]
Evfimievski A, Gehrke J, Srikant R. Limiting privacy breaches in privacy preserving data mining. In Proc. the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Jun. 2003, pp.211–222. DOI: 10.1145/773153.773174.
[35]
Bassily R, Nissim K, Stemmer U, Thakurta A. Practical locally private heavy hitters. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.2288–2296. DOI: 10.5555/3294771.3294989.
[36]
Bun M, Nelson J, Stemmer U. Heavy hitters and the structure of local privacy. In Proc. the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, May 2018, pp.435–447. DOI: 10.1145/3196959.3196981.
[37]
Acharya J, Sun Z T, Zhang H. Hadamard response: Estimating distributions privately, efficiently, and with little communication. arXiv: 1802.04705, 2018. https://arxiv.org/abs/1802.04705, Dec. 2023.
[38]
Liu Y H, Suresh A T, Yu F, Kumar S, Riley M. Learning discrete distributions: User vs item-level privacy. arXiv: 2007.13660, 2020. https://arxiv.org/abs/2007.13660, Dec. 2023.
[39]
Kairouz P, Oh S, Viswanath P. Extremal mechanisms for local differential privacy. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.2879–2887. DOI: 10.5555/2969033.2969148.
Journal of Computer Science and Technology
Pages 1403-1422
Cite this article:
Zhao D, Zhao S-Y, Chen H, et al. Hadamard Encoding Based Frequent Itemset Mining under Local Differential Privacy. Journal of Computer Science and Technology, 2023, 38(6): 1403-1422. https://doi.org/10.1007/s11390-023-1346-7

255

Views

0

Crossref

2

Web of Science

2

Scopus

0

CSCD

Altmetrics

Received: 31 January 2021
Accepted: 21 February 2023
Published: 15 November 2023
© Institute of Computing Technology, Chinese Academy of Sciences 2023
Return