Achieving Differential Privacy of Genomic Data Releasing via Belief Propagation

Zaobo He; Yingshu Li; Ji Li; Kaiyang Li; Qing Cai; Yi Liang

doi:10.26599/TST.2018.9010037

| Sign up

PDF (263 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Figures (1)

Fig. 1

Tables (1)

Table 1

Open Access

Achieving Differential Privacy of Genomic Data Releasing via Belief Propagation

Zaobo He, Yingshu Li(), Ji Li, Kaiyang Li, Qing Cai, Yi Liang

Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA.

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.

Show Author Information

Abstract

Privacy preserving data releasing is an important problem for reconciling data openness with individual privacy. The state-of-the-art approach for privacy preserving data release is differential privacy, which offers powerful privacy guarantee without confining assumptions about the background knowledge about attackers. For genomic data with huge-dimensional attributes, however, current approaches based on differential privacy are not effective to handle. Specifically, amount of noise is required to be injected to genomic data with tens of million of SNPs (Single Nucleotide Polymorphisms), which would significantly degrade the utility of released data. To address this problem, this paper proposes a differential privacy guaranteed genomic data releasing method. Through executing belief propagation on factor graph, our method can factorize the distribution of sensitive genomic data into a set of local distributions. After injecting differential-privacy noise to these local distributions, synthetic sensitive data can be obtained by sampling on noise distribution. Synthetic sensitive data and factor graph can be further used to construct approximate distribution of non-sensitive data. Finally, non-sensitive genomic data is sampled from the approximate distribution to construct a synthetic genomic dataset.

Keywords

differential privacy SNP/trait associations belief propagation factor graph data releasing

References

[1]

https://www.23andme.com/, 2017.

[2]

The NHGRI-EBI Catalog of published genome-wide association studies, https://www.ebi.ac.uk/gwas/docs/about, 2017.

[3]

Disgenet-A database of gene-disease associations, http://www.disgenet.org/web/DisGeNET/menu, 2017.

[4]

Cynthia

, Differential privacy, in Encyclopedia of Cryptography and Security. Springer, 2011, pp. 338-340.

Crossref

[5]

Caroline

, A.

Slavkovic

, and S. E.

Fienberg

, Privacy preserving data sharing for genome-wide association studies, Journal of Privacy and Confidentiality, vol. 5, no. 1, p. 137, 2013.

Crossref Google Scholar

[6]

Wang

, N.

Mohammed

, and R.

Chen

, Differentially private genome data dissemination through top-down specialization, BMC Medical Informatics and Decision Making, vol. 14, no. S1, p. S2, 2014.

Crossref Google Scholar

[7]

Simmons

and B.

Berger

, Realizing privacy preserving genome-wide association studies, Bioinformatics, vol. 32, no. 9, pp. 1293-1300, 2016.

Crossref Google Scholar

[8]

Johnson

and V.

Shmatikov

, Privacy-preserving data exploration in genome-wide association studies, in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 1079-1087.

Crossref

[9]

Sankararaman

, G.

Obozinski

, M. I.

Jordan

, and E.

Halperin

, Genomic privacy and limits of individual detection in a pool, Nature Genetics, vol. 41, no. 9, pp. 965-967, 2009.

Crossref Google Scholar

[10]

Homer

, S.

Szelinger

, M.

Redman

, D.

Duggan

, W.

Tembe

, J.

Muehling

, J. V.

Pearson

, D. A.

Stephan

, S. F.

Nelson

, and D. W.

Craig

, Resolving individuals contributing trace amounts of DNA to highly complex mixtures using highdensity SNP genotyping microarrays, PLoS Genet., vol. 4, no. 8, p. e1000167, 2008.

Crossref Google Scholar

[11]

Cai

, Z.

, X.

Guan

, and Y.

, Collective datasanitization for preventing sensitive information inference attacks in social networks, IEEE Transactions on Dependable and Secure Computing, .

Crossref Google Scholar

[12]

, Z.

Cai

, Y.

Sun

, Y.

, and X.

Cheng

, Customized privacy preserving for inherent data and latent data, Personal Ubiquitous Comput., vol. 21, no. 1, pp. 1-12, 2017.

Crossref Google Scholar

[13]

Han

, M.

Yan

, Z.

Cai

, Y.

, X.

Cai

, and J.

, Influence maximization by probing partial communities in dynamic online social networks, Transactions on Emerging Telecommunications Technologies, vol. 28, no. 4, 2016.

Crossref Google Scholar

[14]

Han

, M.

Yan

, Z.

Cai

, and Y.

, An exploration of broader influence maximization in timeliness networks with opportunistic selection, Journal of Network and Computer Applications, vol. 63, pp. 39-49, 2016.

Crossref Google Scholar

[15]

, Z.

Cai

, and J.

, Latent-data privacy preserving with customized data utility for social network data, IEEE Transactions on Vehicular Technology, .

Crossref Google Scholar

[16]

Jiang

and Y.

Zhang

, Perfect domination and small cycles, Discrete Mathematics, Algorithms and Applications, vol. 9, no. 3, 2017.

Google Scholar

[17]

M. V.

Dhanyamol

and S.

Mathew

, On transit functions in weighted graphs, Discrete Mathematics, Algorithms and Applications, vol. 9, no. 3, 2017.

Google Scholar

[18]

Zhang

, Z.

Cai

, and X.

Wang

, Fakemask: A novel privacy preserving approach for smartphones, IEEE Transactions on Network and Service Management, vol. 13, no. 2, pp. 335-348, 2016.

Crossref Google Scholar

[19]

, Z.

Cai

, Q.

Han

, W.

Tong

, L.

Sun

, and Y.

, An energy efficient privacy-preserving content sharing scheme in mobile social networks, Personal Ubiquitous Comput., vol. 20, no.5, pp. 1-14, 2016.

Crossref Google Scholar

[20]

Zheng

, Z.

Cai

, J.

, and H.

Gao

, Location privacy-aware review publication mechanism for local business service systems, in The 36th Annual IEEE International Conference on Computer Communications, 2017.

Crossref

[21]

Y. E.

Sun

, H.

Huang

, X. Y.

, Y.

, M.

Tian

, H.

, and M.

Xiao

, Privacy-preserving strategy-proof auction mechanisms for resource allocation, Tsinghua Science and Technology, vol. 22, no. 2, pp. 119-134, 2017.

Crossref Google Scholar

[22]

Zhao

, Y.

Xiao

, Y.

Huang

, and X.

Cui

, A private user data protection mechanism in trust-zone architecture based on identity authentication, Tsinghua Science and Technology, vol. 22, no. 2, pp. 218-225, 2017.

Crossref Google Scholar

[23]

Zheng

, G.

Luo

, and Z.

Cai

, A fair mechanism for private data publication in online social networks, IEEE Transactions on Network Science and Engineering, 2018. (Accepted)

Google Scholar

[24]

Bhaskar

, S.

Laxman

, A.

Smith

, and A.

Thakurta

, Discovering frequent patterns in sensitive data, in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2010.

Crossref

[25]

Humbert

, E.

Ayday

, J.-P.

Hubaux

, and A.

Telenti

, Addressing the concerns of the lacks family: Quantification of kin genomic privacy, in Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, 2013, pp. 1141-1152.

Crossref

Tsinghua Science and Technology

Volume 23 Issue 4,
August 2018

Pages 389-395

DOI: 10.26599/TST.2018.9010037

Cite this article:

He Z, Li Y, Li J, et al. Achieving Differential Privacy of Genomic Data Releasing via Belief Propagation. Tsinghua Science and Technology, 2018, 23(4): 389-395. https://doi.org/10.26599/TST.2018.9010037

Father	Mother
Father	BB	Bb	bb
BB	(1, 0, 0)	(1/2, 1/2, 0)	(0, 1, 0)
Bb	(1/2, 1/2, 0)	(1/4, 1/2, 1/4)	(0, 1/2, 1/2)
bb	(0, 1, 0)	(0, 1/2, 1/2)	(0, 0, 1)