AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (6.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Streaming Histogram Publication over Weighted Sliding Windows Under Differential Privacy

School of Computer Science and Technology, Anhui University of Technology, Ma’anshan 243032, China, and also with Institute for Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, China
Baosight Software (Anhui) Co. Ltd., Ma’anshan 243000, China
School of Electrical Engineering and Computer Science, Washington State University, Pullman 99164, WA, USA
Show Author Information

Abstract

Continuously publishing histograms in data streams is crucial to many real-time applications, as it provides not only critical statistical information, but also reduces privacy leaking risk. As the importance of elements usually decreases over time in data streams, in this paper we model a data stream by a sequence of weighted sliding windows, and then study how to publish histograms over these windows continuously. The existing literature can hardly solve this problem in a real-time way, because they need to buffer all elements in each sliding window, resulting in high computational overhead and prohibitive storage burden. In this paper, we overcome this drawback by proposing an online algorithm denoted by Efficient Streaming Histogram Publishing (ESHP) to continuously publish histograms over weighted sliding windows. Specifically, our method first creates a novel sketching structure, called Approximate-Estimate Sketch (AESketch), to maintain the counting information of each histogram interval at every time instance; then, it creates histograms that satisfy the differential privacy requirement by smartly adding appropriate noise values into the sketching structure. Extensive experimental results and rigorous theoretical analysis demonstrate that the ESHP method can offer equivalent data utility with significantly lower computational overhead and storage costs when compared to other existing methods.

References

[1]

A. McAfee and E. Brynjolfsson, Big data: The management revolution, Harv. Bus. Rev., vol. 90, no. 10, pp. 60–68, 2012.

[2]

X. Wang, Z. Liu, Y. Gao, X. Zheng, Z. Dang, and X. Shen, A near-optimal protocol for the grouping problem in RFID systems, IEEE Trans. Mobile Comput., vol. 20, no. 4, pp. 1257–1272, 2021.

[3]

Z. Hu and D. Li, Improved heuristic job scheduling method to enhance throughput for big data analytics, Tsinghua Science and Technology, vol. 27, no. 2, pp. 344–357, 2022.

[4]

R. Pan, Z. Li, J. Cao, H. Zhang, and X. Xia, Electrical load tracking scheduling of steel plants under time-of-use tariffs, Comput. Ind. Eng., vol. 137, p. 106049, 2019.

[5]

A. Belhassena and H. Wang, Trajectory big data processing based on frequent activity, Tsinghua Science and Technology, vol. 24, no. 3, pp. 317–332, 2019.

[6]

C. Zhan, H. Hu, Z. Liu, Z. Wang, and S. Mao, Multi-UAV-enabled mobile-edge computing for time-constrained IoT applications, IEEE Internet Things J., vol. 8, no. 20, pp. 15553–15567, 2021.

[7]

D. Wei, H. Ning, F. Shi, Y. Wan, J. Xu, S. Yang, and L. Zhu, Dataflow management in the internet of things: Sensing, control, and security, Tsinghua Science and Technology, vol. 26, no. 6, pp. 918–930, 2021.

[8]

I. Lee and K. Lee, The internet of things (IoT): Applications, investments, and challenges for enterprises, Bus. Horiz., vol. 58, no. 4, pp. 431–440, 2015.

[9]

J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, Internet of things (IoT): A vision, architectural elements, and future directions, Future Gener. Comput. Syst., vol. 29, no. 7, pp. 1645–1660, 2013.

[10]

M. A. Khan and K. Salah, IoT security: Review, blockchain solutions, and open challenges, Future Gener. Comput. Syst., vol. 82, pp. 395–411, 2018.

[11]

Z. Liu, Q. Li, X. Chen, C. Wu, S. Ishihara, J. Li, and Y. Ji, Point cloud video streaming: Challenges and solutions, IEEE Netw., vol. 35, no. 5, pp. 202–209, 2021.

[12]

Z. Liu, C. Zhan, Y. Cui, C. Wu, and H. Hu, Robust edge computing in UAV systems via scalable computing and cooperative computing, IEEE Wirel. Commun., vol. 28, no. 5, pp. 36–42, 2021.

[13]

Z. Liu, J. Li, X. Chen, C. Wu, S. Ishihara, Y. Ji, and L. Li, Fuzzy logic-based adaptive point cloud video streaming, IEEE Open J. Comput. Soc., vol. 1, pp. 121–130, 2020.

[14]
C. Xiang, W. Cheng, C. Lin, X. Zhang, D. Liu, X. Zheng, and Z. Li, LSTAloc: A driver-oriented incentive mechanism for mobility-on-demand vehicular crowdsensing market, IEEE Trans. Mobile Comput.
[15]

A. Cela, T. Jurik, R. Hamouche, R. Natowicz, A. Reama, S. I. Niculescu, and J. Julien, Energy optimal real-time navigation system, IEEE Intell. Transp. Syst. Mag., vol. 6, no. 3, pp. 66–79, 2014.

[16]
H. Yi, H. Jung, and S. Bae, Deep neural networks for traffic flow prediction, in Proc. 2017 IEEE Int. Conf. Big Data and Smart Computing, Jeiu, Republic of Korea, 2017, pp. 328–331.
[17]

L. Zhao, K. Yang, Z. Tan, X. Li, S. Sharma, and Z. Liu, A novel cost optimization strategy for SDN-enabled UAV-assisted vehicular computation offloading, IEEE Trans. Intell. Transp. Syst., vol. 22, no. 6, pp. 3664–3674, 2020.

[18]

C. Wu, Z. Liu, F. Liu, T. Yoshinaga, Y. Ji, and J. Li, Collaborative learning of communication routes in edge-enabled multi-access vehicular environment, IEEE Trans. Cognit. Commun. Netw., vol. 6, no. 4, pp. 1155–1165, 2020.

[19]

B. Almadani, B. Saeed, and A. Alroubaiy, Healthcare systems integration using real time publish subscribe (RTPS) middleware, Comput. Electr. Eng., vol. 50, pp. 67–78, 2016.

[20]
C. B. Rjeily, G. Badr, A. H. El Hassani, and E. Andres, Medical data mining for heart diseases and the future of sequential mining in medical field, in Machine Learning Paradigms, G. A. Tsihrintzis, D. N. Sotiropoulos, and L. C. Jain, eds. Cham, Switzerland: Springer, 2019, pp. 71–99.
[21]
C. Xu, Y. Chen, and R. Bie, Sequential pattern mining in data streams using the weighted sliding window model, in Proc. 2009 15 th Int. Conf. Parallel and Distributed Systems, Shenzhen, China, 2009, pp. 886–890.
[22]

P. S. M. Tsai, Mining frequent itemsets in data streams using the weighted sliding window model, Exp. Syst. Appl., vol. 36, no. 9, pp. 11617–11625, 2009.

[23]

G. Lee, U. Yun, and K. H. Ryu, Sliding window based weighted maximal frequent pattern mining over data streams, Exp. Syst. Appl., vol. 41, no. 2, pp. 694–708, 2014.

[24]

D. H. Vu, K. M. Muttaqi, A. P. Agalgaonkar, and A. Bouzerdoum, A multi-feature based approach incorporating variable thresholds for detecting price spikes in the national electricity market of Australia, IEEE Access, vol. 9, pp. 13960–13969, 2021.

[25]
M. Yu, L. Wang, G. Xie, and T. Chu, Stabilization of networked control systems with data packet dropout via switched system approach, in Proc. 2004 IEEE Int. Conf. Robotics and Automation, Taipei, China, 2004, pp. 362–367.
[26]

J. Jiang, H. Wang, and W. Li, A trust model based on a time decay factor for use in social networks, Comput. Electr. Eng., vol. 85, p. 106706, 2020.

[27]

F. Liu, T. Yang, J. Zhou, W. Deng, Q. Yu, P. Zhang, and G. Cheng, Spatial variability and time decay of rock mass mechanical parameters: A landslide study in the Dagushan open-pit mine, Rock Mech. Rock Eng., vol. 53, no. 7, pp. 3031–3053, 2020.

[28]

A. Erramilli, O. Narayan, and W. Willinger, Experimental queueing analysis with long-range dependent packet traffic, IEEE/ACM Trans. Netw., vol. 4, no. 2, pp. 209–223, 1996.

[29]

Y. Ren, Z. Zeng, T. Wang, S. Zhang, and G. Zhi, A trust-based minimum cost and quality aware data collection scheme in P2P network, Peer Peer Netw. Appl., vol. 13, no. 6, pp. 2300–2323, 2020.

[30]
A. Golatkar, A. Achille, and S. Soatto, Time matters in regularizing deep networks: Weight decay and data augmentation affect early learning dynamics, matter little near convergence, in Proc. 33 rd Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 10678–10688.
[31]

J. Xu, Z. Zhang, X. Xiao, Y. Yang, G. Yu, and M. Winslett, Differentially private histogram publication, VLDB J., vol. 22, no. 6, pp. 797–822, 2013.

[32]

X. Zhang, C. Shao, and X. Meng, Accurate histogram release under differential privacy, (in Chinese), J. Comput. Res. Dev., vol. 53, no. 5, pp. 1106–1117, 2016.

[33]

H. Tang, G. Yang, and Y. Bai, Histogram publishing algorithm based on adaptive privacy budget allocation strategy under differential privacy, (in Chinese), Appl. Res. Comput., vol. 37, no. 7, pp. 1952–1957&1963, 2020.

[34]

D. Chen, Y. Li, J. Chen, H. Bi, and X. Ding, Differential privacy via Haar wavelet transform and Gaussian mechanism for range query, Comput. Intell. Neurosci., vol. 2022, pp. 8139813, 2022.

[35]

S. Zhang and X. Li, Differential privacy medical data publishing method based on attribute correlation, Sci. Rep., vol. 12, no. 1, p. 15725, 2022.

[36]

F. Lin, Y. Wu, Y. Wang, and L. Sun, Differentially private statistical publication for two-dimensional data stream, (in Chinese), J. Comput. Appl., vol. 35, no. 1, pp. 88–92, 2015.

[37]

X. J. Zhang and X. F. Meng, Streaming histogram publication method with differential privacy, (in Chinese), J. Softw., vol. 27, no. 2, pp. 381–393, 2016.

[38]
X. Wu, N. Tong, Z. Ye, and Y. Wang, Histogram publishing algorithm based on sampling sorting and greedy clustering, in Proc. 1 st Int. Conf. Blockchain and Trustworthy Systems, Guangzhou, China, 2020, pp. 81–91.
[39]

L. Sun, C. Ge, X. Huang, Y. Wu, and Y. Gao, Differentially private real-time streaming data publication based on sliding window under exponential decay, Comput. Mater. Con., vol. 58, no. 1, pp. 61–78, 2019.

[40]

G. Yang, T. Dong, X. Fang, and S. Su, Association data release with the randomised response based on Bayesian networks, Int. J. Comput. Sci. Eng., vol. 20, no. 1, pp. 120–129, 2019.

[41]

L. Fan and L. Xiong, An adaptive approach to real-time aggregate monitoring with differential privacy, IEEE Trans. Knowl. Data Eng., vol. 26, no. 9, pp. 2094–2106, 2014.

[42]
C. Dwork, Differential privacy: A survey of results, in Proc. 5 th Int. Conf. Theory and Applications of Models of Computation, Xi’an, China, 2008, pp. 1–19.
[43]
F. Esponda, Negative surveys, arXiv preprint arXiv: math/0608176v1, 2006.
[44]

H. Jiang, W. Luo, and Z. Zhang, A privacy-preserving aggregation scheme based on immunological negative surveys for smart meters, Appl. Soft Comput., vol. 85, p. 105821, 2019.

[45]

W. Yang, X. Chen, Z. Xiong, Z. Xu, G. Liu, and X. Zhang, A privacy-preserving aggregation scheme based on negative survey for vehicle fuel consumption data, Inf. Sci., vol. 570, pp. 526–544, 2021.

[46]
H. Jiang and W. Luo, Multi-question negative surveys, in Proc. 3 rd Int. Conf. Data Mining and Big Data, Shanghai, China, 2018, pp. 503–512.
[47]
H. Liao, A study of negative surveys with background knowledge, in Proc. 2019 IEEE 9 th Int. Conf. Electronics Information and Emergency Communication, Beijing, China, 2019, pp. 301–302.
[48]

H. Jiang, W. Luo, B. Duan, and C. Wu, Enhancing the privacy of negative surveys using negative combined categories, Appl. Soft Comput., vol. 96, p. 106578, 2020.

[49]
F. D. McSherry, Privacy integrated queries: An extensible platform for privacy-preserving data analysis, in Proc. 2009 ACM SIGMOD Int. Conf. Management of Data, Providence, RL, USA, 2009, pp. 19–30.
[50]
The UK Government, The UK car accident dataset, https://data.gov.uk/dataset/road-accidents-safety-data, 2014.
[51]
Taxi & Limousine Commission, NYC yellow taxi trip dataset, https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page, 2019.
Tsinghua Science and Technology
Pages 1674-1693
Cite this article:
Wang X, Mo L, Zheng X, et al. Streaming Histogram Publication over Weighted Sliding Windows Under Differential Privacy. Tsinghua Science and Technology, 2024, 29(6): 1674-1693. https://doi.org/10.26599/TST.2023.9010083

246

Views

40

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 17 May 2023
Revised: 10 July 2023
Accepted: 01 August 2023
Published: 20 June 2024
© The Author(s) 2024.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return