School of Computer Science and Engineering, and also with MOE Key Laboratory of Computer Network and Information Integration, Southeast University, Nanjing210096, China.
Division of Math and Computer Science, University of South Carolina Upstate, Spartanburg, SC29303, USA,
Show Author Information
Hide Author Information
Abstract
Biological network alignment is an important research topic in the field of bioinformatics. Nowadays almost every existing alignment method is designed to solve the deterministic biological network alignment problem. However, it is worth noting that interactions in biological networks, like many other processes in the biological realm, are probabilistic events. Therefore, more accurate and better results can be obtained if biological networks are characterized by probabilistic graphs. This probabilistic information, however, increases difficulties in analyzing networks and only few methods can handle the probabilistic information. Therefore, in this paper, an improved Probabilistic Biological Network Alignment (PBNA) is proposed. Based on IsoRank, PBNA is able to use the probabilistic information. Furthermore, PBNA takes advantages of Contributor and Probability Generating Function (PGF) to improve the accuracy of node similarity value and reduce the computational complexity of random variables in similarity matrix. Experimental results on dataset of the Protein-Protein Interaction (PPI) networks provided by Todor demonstrate that PBNA can produce some alignment results that ignored by the deterministic methods, and produce more biologically meaningful alignment results than IsoRank does in most of the cases based on the Gene Ontology Consistency (GOC) measure. Compared with Prob method, which is designed exactly to solve the probabilistic alignment problem, PBNA can obtain more biologically meaningful mappings in less time.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
R.Singh, J.Xu, and B.Berger, Global alignment of multiple protein interaction networks with application to functional orthology detection, PNAS, vol. 105, no. 35, pp. 12763-12768, 2008.
X. L.Guo, L.Gao, and X.Chen, Models and algorithms for alignment of biological networks, (in Chinese), Journal of Software, vol. 21, no. 9, pp. 2089-2106, 2010.
H.Ogata, W.Fujibuchi, and S.Goto, A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters, Nucleic Acids Research, vol. 28, no. 20, pp. 4021-4028, 2000.
E.Hirsh and R.Sharan, Identification of conserved protein complexes based on a model of protein network evolution, Bioinformatics, vol. 23, no 2, pp. e170-e176, 2007.
R.Singh, J.Xu, and B.Berger, Pairwise global alignment of protein interaction networks by matching neighborhood topology, in Research in Computational Molecular Biology. Springer Berlin Heidelberg, 2007, pp. 16-31.
[6]
M.Narayanan and R. M.Karp, Comparing protein interaction networks via a graph match-and-split algorithm, Journal of Computational Biology, vol. 14, no. 7, pp. 892-907, 2007.
Z.Liang, M.Xu, M.Teng, and L.Niu, Comparison of protein interaction networks reveals species conservation and divergence, BMC Bioinformatics, vol. 7, no. 1, pp. 457-472, 2006.
M.Koyuturk, Y.Kim, U.Topkara, S.Subramaniam, W.Szpankowski, and A.Grama, Pairwise alignment of protein interaction networks, Journal of Computational Biology, vol. 13, no. 2, pp. 182-199, 2006.
M.Ashburner, C. A.Ball, J. A.Blake, D.Botstein, and H.Butler, Gene ontology: Tool for the unification of biology, Nat. Genet., vol. 25, no. 1, pp. 25-29, 2000.
W.Tian and N. F.Samatova, Pairwise alignment of interaction networks by fast identification of maximal conserved patterns, Pacific Symposium on Biocomputing, vol. 14, pp. 99-110, 2009.
F.Towfic, M. H. W.Greenlee, and V.Honavar, Aligning biomolecular networks using modular graph kernels, in Algorithms in Bioinformatics.Springer Berlin Heidelberg, 2009, pp. 345-361.
W.Guan, J.Wang, and F.He, The advance in research methods for large-scale protein-protein interactions (in Chinese), Chinese Bulletin of Life Sciences, vol. 18, no. 5, pp. 507-512, 2006.
J.Sun, J.Xu, Y.Li, and T.Shi, Analysis and application of large-scale protein-protein interaction data, (in Chinese), Chinese Science Bulletin, vol. 50, no. 19, pp. 2055-2060, 2005.
P.Jancura, J.Heringa, and E.Marchiori, Divide, align and full-search for discovering conserved protein complexes, in Machine Learning and Data Mining in Bioinformatics.Springer Berlin Heidelberg, 2008, pp. 71-82.
[20]
A. E.Aladag and C.Erten, SPINAL: Scalable protein interaction network alignment, Bioinformatics, vol. 29, no. 7, pp. 917-924, 2013.
M.Kanehisa, S.Goto, S.Kawashima, Y.Okuno, and M.Hattori, The KEGG resource for deciphering the genome, Nucleic Acids Research, vol. 32, no. s1, pp. D277-D280, 2004.
Zhao M, Zhong W, He J. PBNA: An Improved Probabilistic Biological Network Alignment Method. Tsinghua Science and Technology, 2014, 19(6): 658-667. https://doi.org/10.1109/TST.2014.6961034
113745N-2014-06-658.F001
A probabilistic network (top) and its four possible deterministic network instances (bottom).
113745N-2014-06-658.F002
The framework of PBNA.
2.2.2 Integrating the probabilistic information into similarity scores
Now we generalize the alignment problem into the situation where is deterministic while is probabilistic. A main challenge of probabilistic network alignment is to compute R. This is because that the nature of probabilistic network makes it almost impossible to choose one specific deterministic network and the corresponding A of it from all possible alternatives. There are alternative matrices. We solve this problem using an approach proposed in Prob[
4
]: All of these matrices are modeled with a matrix of random variables and A is replaced with its expected value, . Now the question is how to compute ?
The degree of every node, namely , is obviously probabilistic in probabilistic networks. can be any value in the sequence is the largest possible degree of node . Let a discrete random variable model the degree of node . Then is the sequence of degree distribution of node in probabilistic network .
Algorithm 1 Computing according to Eq. (
8
).
A is now a random matrix with entries that depend on , and Eq. (
5
) is converted into Eq. (
6
) as the following form:
In order to compute the expectation of A described in Eq. (
6
), we focus on its one specific entry . Note that the related information of node is deterministic as it is from . is deterministic, and whether is included in of is also deterministic (see Algorithm 1). According to Eq. (
6
), if node is isolated, namely , is a constant, ; if is not isolated and , is constant, 0; if , is also constant, . For the remaining cases, is a random variable whose expectation needs to be computed. The definition of expectation of a discrete random variable is
The possible values of denoted with , can be , , or according to Eq. (
6
). From the above discussion, we know that the prerequisite for being a random variable is that . Therefore, node has no possibility being an isolated node. According to Eq. (
6
), can only be . Moreover, leads to . And if the probabilistic edge is guaranteed to appear in , it implies that and are neighbors. The degree of is at least 1. Thus, the probability distribution of the random variable should take this prior information into consideration. In mathematical way, this can be expressed as a conditional probability , for . Finally, the expectation of matrix A can be computed as following:
The main technical difficulty in Eq. (
8
) is how to compute the conditional degree distribution .
Definition 1[
4
] Let be a discrete random variable taking integer values from to . The PGF of is defined as the polynomial of :
Example 1 Let be a discrete random variable taking values from . The probability distribution is . The PGF of is the polynomial .
Thus, we know that by simply listing the coefficients of PGF, the distribution of will be obtained. And the following Theorem 1 can be used to compute PGF of .
Theorem 1[
4
] Let be a probabilistic network and be the set of edges incident on node . The PGF of the degree distribution of is
Example 2 In
Fig. 1
, the PGF of the distribution for node is the multiplication of the polynomials corresponding to each incident edge: . The degree distribution of is , , .
Thus, the degree distribution of node can be computed by the PGF of conveniently. The time complexity of the entire process then decreases from (exhaustive method) to [
4
].
Note that our goal in Eq. (
8
) is to compute the conditional degree distribution . In fact, the condition that a particular edge is present can be expressed by dividing the PGF of by , which is the PGF of if edge is the only edge incident on node . The result of the division produces the PGF of , which is the degree of node without considering the edge at all. Finally, by shifting the distribution of by one position, we get the conditional degree distribution . The intuitive explanation of this dividing-and-shifting procedure is that the probability of the degree of node being under the condition that edge is already present equals to the probability of the degree of node being under the condition that node doesn’t have edge at all.
The pseudo code for computing according to Eq. (
8
) is shown in Algorithm 1.
Algorithm 1 demonstrates how to compute in three major steps:
(1) Lines 1-3: construct the PGF for every node in probabilistic network ;
(2) Lines 4-6: for every possible pair where and , compute its initial similarity score for finding the set of Contributor according to Line 5;
(3) Line 7 to end: for every entry of A, compute its similarity score according to three rules in Eq. (
8
): Lines 8-14 are for the first rule; Lines 15-17 are for the second rule; and Lines 18-20 are for the third.
113745N-2014-06-658.F003
Agreement statistics.
113745N-2014-06-658.F004
statistics of PBNA and IsoRank.
113745N-2014-06-658.F005
GOC statistics of PBNA and Prob.