School of Computer Science and Technology, Shandong University, Qingdao 266200, China
State Key Laboratory of Software Development Environment, School of Computer Science and Engineering and the Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100191, China
Big Data Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
Show Author Information
Hide Author Information
Abstract
Large-scale graphs usually exhibit global sparsity with local cohesiveness, and mining the representative cohesive subgraphs is a fundamental problem in graph analysis. The -truss is one of the most commonly studied cohesive subgraphs, in which each edge is formed in at least triangles. A critical issue in mining a -truss lies in the computation of the trussness of each edge, which is the maximum value of that an edge can be in a -truss. Existing works mostly focus on truss computation in static graphs by sequential models. However, the graphs are constantly changing dynamically in the real world. We study distributed truss computation in dynamic graphs in this paper. In particular, we compute the trussness of edges based on the local nature of the -truss in a synchronized node-centric distributed model. Iteratively decomposing the trussness of edges by relying only on local topological information is possible with the proposed distributed decomposition algorithm. Moreover, the distributed maintenance algorithm only needs to update a small amount of dynamic information to complete the computation. Extensive experiments have been conducted to show the scalability and efficiency of the proposed algorithm.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
Q.Luo, D.Yu, Z.Cai, X.Lin, and X.Cheng, Hypercore maintenance in dynamic hypergraphs, in Proc. 37th IEEE Int. Conf. on Data Engineering, Chania, Greece, 2021, pp. 2051–2056.
Q.Luo, D.Yu, F.Li, Z.Dou, Z.Cai, J.Yu, and X.Cheng, Distributed core decomposition in probabilistic graphs, in Proc. 8th Int. Conf. on Computational Data and Social Networks, Ho Chi Minh City, Vietnam, 2019, pp. 16–32.
J.Abello, M. G. C.Resende, and S.Sudarsky, Massive quasi-clique detection, in Proc. 5th Latin American Symp. on LATIN 2002: Theoretical Informatics, Cancun, Mexico, 2002, pp. 598–612.
J.Cohen, Trusses: Cohesive Subgraphs for Social Network Analysis., Tech. Rep., National Security Agency, Middleburg, VA, USA, 2008.
[9]
X.Huang, L. V. S.Lakshmanan, J. X.Yu, and H.Cheng, Approximate closest community search in networks, Proc. VLDB Endow., vol. 9, no. 4, pp. 276–287, 2015.
M.Sozio and A.Gionis, The community-search problem and how to plan a successful cocktail party, in Proc. 16th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Washington, DC, USA, 2010, pp. 939–948.
J.Zhang, P. S.Yu, and Y.Lv, Enterprise employee training via project team formation, in Proc. 10th ACM Int. Conf. on Web Search and Data Mining, Cambridge, UK, 2017, pp. 3–12.
T.Chakraborty, S.Patranabis, P.Goyal, and A.Mukherjee, On the formation of circles in co-authorship networks, in Proc. 21th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Sydney, Australia, 2015, pp. 109–118.
J.Ugander, L.Backstrom, C.Marlow, and J. M.Kleinberg, Structural diversity in social contagion, Proc. Natl. Acad. Sci. USA, vol. 109, no. 16, pp. 5962–5966, 2012.
X.Huang, H.Cheng, L.Qin, W.Tian, and J. X.Yu, Querying k-truss community in large and dynamic graphs, in Proc. 2014 ACM SIGMOD Int. Conf. on Management of Data, Snowbird, UT, USA, 2014, pp. 1311–1322.
G.Malewicz, M. H.Austern, A. J. C.Bik, J. C.Dehnert, I.Horn, N.Leiser, and G.Czajkowski, Pregel: A system for large-scale graph processing, in Proc. 2010 ACM SIGMOD Int. Conf. on Management of Data, Indianapolis, IN, USA, 2010, pp. 135–146.
Y. C.Low, D.Bickson, J.Gonzalez, C.Guestrin, A.Kyrola, and J. M.Hellerstein, Distributed graphlab: A framework for machine learning and data mining in the cloud, Proc. VLDB Endow., vol. 5, no. 8, pp. 716–727, 2012.
J.Sun, D.Zhou, H.Chen, C.Chang, Z.Chen, W.Li, and L.He, GPSA: A graph processing system with actors. in Proc. 44th Int. Conf. on Parallel Processing, Beijing, China, 2015, pp. 709–718.
S.Chu and J.Cheng, Triangle listing in massive networks and its applications, in Proc. 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Diego, CA, USA, 2011, pp. 672–680.
A. E.Sariyüce, C.Seshadhri, and A.Pinar, Local algorithms for hierarchical dense subgraph discovery, Proc. VLDB Endow., vol. 12, no. 1, pp. 43–56, 2018.
Y.Zhang and S.Parthasarathy, Extracting analyzing and visualizing triangle K-core motifs within networks, in Proc. 2012 IEEE 28th Int. Conf. on Data Engineering, Arlington, VA, USA, 2012, pp. 1049–1060.
P. L.Chen, C. K.Chou, and M. S.Chen, Distributed algorithms for k-truss decomposition, in Proc. 2014 IEEE Int. Conf. on Big Data, Washington, DC, USA, 2014, pp. 471–480.
H.Kabir and K.Madduri, Parallel k-truss decomposition on multicore systems. in Proc. 2017 IEEE High Performance Extreme Computing Conf., Waltham, MA, USA, 2017, pp. 1–7.
H.Kabir and K.Madduri, Shared-memory graph truss decomposition, in Proc. 2017 IEEE 24th Int. Conf. on High Performance Computing, Jaipur, India, 2017, pp. 13–22.
Y.Shao, L.Chen, and B.Cui, Efficient cohesive subgraphs detection in parallel, in Proc. 2014 ACM SIGMOD Int. Conf. on Management of Data, Snowbird, UT, USA, 2014, pp. 613–624.
S.Smith, X.Liu, N. K.Ahmed, A. S.Tom, F.Petrini, and G.Karypis, Truss decomposition on shared-memory parallel systems, in Proc. 2017 IEEE High Performance Extreme Computing Conf., Waltham, MA, USA, 2017, pp. 1–6.
F.Esfahani, J.Wu, V.Srinivasan, A.Thomo, and K.Wu, Fast truss decomposition in large-scale probabilistic graphs, In Proc. 22nd Int. Conf. on Extending Database Technology, Lisbon, Portugal, 2019, pp. 722–725.
Z.Zou, Bitruss decomposition of bipartite graphs, in Proc. 21st Int. Conf. on Database Systems for Advanced Applications, Dallas, TX, USA, 2016, pp. 218–233.
X.Huang, W.Lu, and L. V. S.Lakshmanan, Truss decomposition of probabilistic graphs: Semantics and algorithms, in Proc. 2016 Int. Conf. on Management of Data, San Francisco, CA, USA, 2016, pp. 77–90.
E.Akbas and P.Zhao, Truss-based community search: A truss-equivalence based indexing approach, Proc. VLDB Endow., vol. 10, no. 11, pp. 1298–1309, 2017.
Y.Zhang and J. X.Yu, Unboundedness and efficiency of truss maintenance in evolving graphs, in Proc. 2019 Int. Conf. on Management of Data, Amsterdam, The Netherlands, 2019, pp. 1024–1041.
T. G.Kolda, A.Pinar, T. D.Plantenga, and C.Seshadhri, A scalable generative graph model with community structure, SIAM J. Sci. Comput., vol. 36, no. 5, pp. C424–C452, 2014.
Mo Z, Luo Q, Yu D, et al. Distributed Truss Computation in Dynamic Graphs. Tsinghua Science and Technology, 2023, 28(5): 873-887. https://doi.org/10.26599/TST.2022.9010019
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
10.26599/TST.2022.9010019.F001
Example of k-truss.
10.26599/TST.2022.9010019.F002
Example of inserting the same number of edges in the graph but with different trussness changes. In graphs, the values marked on the edges are the trussness of the edges, and the edges with different trussness are distinguished by the thickness of the edges. A large edge trussness is indicated by thick lines in the graphs.
10.26599/TST.2022.9010019.F003
2-hop auxiliary index for node 1 in Fig. 2.
Distributed Truss Decomposition
We propose our distributed truss decomposition algorithm in this section based on our distributed model. We first introduce two auxiliary indexes, namely adjMap and trussMap, which are based on the locality property of the -truss. We then introduce the procedure and analysis of the proposed algorithm, which is based on the two indexes.
The distributed model encounters the following two truss decomposition challenges.
• The most important metric for calculating the trussness is the support of edges, which is the initial value of trussness for edges. All neighboring nodes of both endpoints of the edge must be identified to calculate the support of an edge.
• Edges are virtual connections for communication between nodes. Thus, the value of support for each edge cannot exist on a nonexistent entity.
To address the above challenges, we calculate the support value of all neighboring edges by utilizing the node to store local topological information regarding its incident edges and two-hop neighboring nodes. Furthermore, the following property bears the calculation of the trussness applied to the support of the edges. The locality property is formally presented as follows.
Property 1 (Locality[
25
]). Given a graph , , if and only if the following conditions are met.
(1) An edge subset such that , edges in form total triangles with , and the trussness of each edge in is not less than .
(2) There is not a subset does not exist such that , edges in form total triangles with , and the trussness of each edge in is not less than .
The property describes that if the trussness of an edge is equal to , then at least triangles contain and the trussness of each edge in these triangles is at least . Therefore, the trussness of an edge can be calculated from the trussness of the incident edges of the edge. More specifically, the trussness can be calculated from the trussness of the edges that form triangles with the edge.
We design two indexes according to the locality property to store the neighboring information for each node. For a node in our distributed system, the first index is adjMap, which is a list of key-value pairs used to store -hop adjacency nodes of . The keys of adjMap are the neighbor nodes of , and the value corresponding to each key is the linked list of neighboring nodes of the stored node by the key. The second index is trussMap, which stores the estimated trussness of -hop incident edges of a node . The trussness of the incident edges of is calculated and maintained by .
Figure 3
shows an example of the adjMap and the trussMap for node 1 in
Fig. 2
. In
Fig. 3a
, the keys of adjMap are maintained by node 1, and the values are received from neighboring nodes of node 1.
Figure 3
shows the change of node 1 updating the trussMap after receiving messages from neighbors. The trussMap comprises two parts: one part is the trussness estimates of the incident edges of node 1 (this part is calculated and maintained by node 1 itself), and another part is the trussness estimates of the incident edges of node 1’s neighboring nodes (this part is kept by node1 receiving data from the neighboring nodes).
10.26599/TST.2022.9010019.F003
2-hop auxiliary index for node 1 in Fig. 2.
In our distributed model, each node is responsible for computing and updating the estimated trussness of its incident edges by maintaining trussMap. Algorithm 1 shows the execution process of node . The proposed algorithm has two phases: the initialization and the execution phases.
In the initialization phase (lines 1–8), the adjacency nodes of exchange their adjMap values with to compute trussMap. The initially estimated trussness of an edge is the support of the edge plus (line 6). sends its trussMap to the neighboring nodes and sets the Changed flag to false to end the process (lines 7 and 8).
The execution phase is the process of iterative execution of the entire distributed system (lines 9–22). In each round of the execution phase, receives the trussness of incident edges to neighbor nodes (line 11). Thus, each node can receive the trussness of the 2-hop incident edges. Afterward, updates the trussness of incident edges through the locality property by the function ComputeTruss (line 16). This function then returns the maximum for which a value not less than exists in the counter, and the algorithm is shown in Algorithm 2. If the updated estimated trussness of the incident edges is smaller than the value of the current record, then the value in trussMap is updated, and the Changed flag is set to true (lines 17–19). The updated trussMap is sent to the neighboring nodes of (lines 20 and 21) after all the incident edges of with their updated trussness are obtained. The steps of the execution phase are repeated until the termination condition is met and the system stops. Finally, the stored values in trussMap are the trussness of the edges.
Termination mechanism. Several alternatives for termination conditions of the distributed computing system are avaliable[
38
]. We use the barrier synchronization mechanism as the termination condition. If none of the nodes in the system updates trussMap in a round, that is, no active nodes are avaliable, then the system will stop, and the computation of trussness is complete.
Efficient message strategy. Two strategies that can further reduce the volume of passing messages are avaliable.
(1) The support of an edge is determined by the number of triangles where the edge is contained. Therefore, some edges may not be in any triangle and hence will not participate in any computing. Thus, the trussness of these edges will not be stored in the trussMap.
(2) The trussMap maintained by each node stores the estimated trussness of all incident edges. In the phase of sending messages in each round, the entire trussMap is sent to the neighbor nodes. This approach is substantially inefficient because only some items in trussMap may have changed. Theregore, we only need to send the updated items in trussMap to the neighboring nodes.
We then analyze the accuracy and efficiency of Algorithm 2.
Lemma 1. In Algorithm 1, the estimated trussness of is always higher than .
Proof. Assuming and , at least triangles contain , and the trussness of each edge in these triangles is no less than according to the definition of -truss. Assume is an edge that forms a triangle with , , and at time . This assumption indicates that the endpoint or updates , which causes at time . By analogy, the endpoints of update some estimated trussness of the edges that form the triangles with at . The inference leads to an infinite sequence of . However, this sequence has a loop that yields , which is a contradiction.
■
Lemma 2. Algorithm 1 ensures the convergence of the estimated trussness to for .
Proof. In any round during the entire process, will only decrease and will be no less than by Lemma 1. Therefore, we only need to prove that will eventually be equal to for any edge . We set in the initialization phase and provide proof by induction on the values of .
• . In this case, , that is, is not in any triangles. in the beginning of the distributed system.
• . Assume that always holds. Therefore, each edge in at least two triangles has trussness no less than . Therefore, a subgraph that contains these edges is avaliable, and the support of each edge is larger than . The subgraph is a -truss, which is a contradiction.
• Induction step: suppose there is an edge such that and forever constantly. Then, triangles contain and for each such that . According to Lemma 1, by .
If , then will eventually decrease to according to Property 1.
If , then is in at least triangles and such that . Therefore, according to the definition of -truss, which is a contradiction.
■
Theorem 1. The number of rounds of Algorithm 1 is bound by .
Proof. As long as this distributed system is running, at least one edge will perform the update operation of estimated trussness. In the worst case, only one edge of the estimated trussness is updated in each round, and this edge only changes by 1. For an edge , the quantity by which its estimated trussness decreases from its initial value to its final value is . This quantity represents the update error of . Thus, the execution time is bound by the sum of update errors of all edges, which is .
■
From the above lemmas and theorem, we derive the following Theorems regearding the time and information complexities of the distributed truss decomposition algorithm.
Theorem 2. Given a graph , the time complexity of Algorithm 1 to compute the trussness of edges is , where is the number of edges in and is the minimum trussness of the graph.
Proof. As the system is running, at least one edge whose estimated trussness will be updated in each round exists. This condition indicates that at least one minimum trussness must be determined in each round. After the trussness of all edges has been determined, the system requires one round of initialization and one round of deterministic termination. Hence, the number of rounds of the algorithm is no larger than , and the time complexity is .
■
Theorem 3 . Given a graph , the message complexity of Algorithm 1 to compute the trussness of edges is .
Proof. In the worst case, one of the endpoints of an edge receives at most messages from the neighboring nodes. In whatever graph, the minimum value of trussness of edges is . Therefore, the number of messages of the algorithm is bound by , and the message complexity is .
■
Distributed Truss Decomposition
We propose our distributed truss decomposition algorithm in this section based on our distributed model. We first introduce two auxiliary indexes, namely adjMap and trussMap, which are based on the locality property of the -truss. We then introduce the procedure and analysis of the proposed algorithm, which is based on the two indexes.
The distributed model encounters the following two truss decomposition challenges.
• The most important metric for calculating the trussness is the support of edges, which is the initial value of trussness for edges. All neighboring nodes of both endpoints of the edge must be identified to calculate the support of an edge.
• Edges are virtual connections for communication between nodes. Thus, the value of support for each edge cannot exist on a nonexistent entity.
To address the above challenges, we calculate the support value of all neighboring edges by utilizing the node to store local topological information regarding its incident edges and two-hop neighboring nodes. Furthermore, the following property bears the calculation of the trussness applied to the support of the edges. The locality property is formally presented as follows.
Property 1 (Locality[
25
]). Given a graph , , if and only if the following conditions are met.
(1) An edge subset such that , edges in form total triangles with , and the trussness of each edge in is not less than .
(2) There is not a subset does not exist such that , edges in form total triangles with , and the trussness of each edge in is not less than .
The property describes that if the trussness of an edge is equal to , then at least triangles contain and the trussness of each edge in these triangles is at least . Therefore, the trussness of an edge can be calculated from the trussness of the incident edges of the edge. More specifically, the trussness can be calculated from the trussness of the edges that form triangles with the edge.
We design two indexes according to the locality property to store the neighboring information for each node. For a node in our distributed system, the first index is adjMap, which is a list of key-value pairs used to store -hop adjacency nodes of . The keys of adjMap are the neighbor nodes of , and the value corresponding to each key is the linked list of neighboring nodes of the stored node by the key. The second index is trussMap, which stores the estimated trussness of -hop incident edges of a node . The trussness of the incident edges of is calculated and maintained by .
Figure 3
shows an example of the adjMap and the trussMap for node 1 in
Fig. 2
. In
Fig. 3a
, the keys of adjMap are maintained by node 1, and the values are received from neighboring nodes of node 1.
Figure 3
shows the change of node 1 updating the trussMap after receiving messages from neighbors. The trussMap comprises two parts: one part is the trussness estimates of the incident edges of node 1 (this part is calculated and maintained by node 1 itself), and another part is the trussness estimates of the incident edges of node 1’s neighboring nodes (this part is kept by node1 receiving data from the neighboring nodes).
10.26599/TST.2022.9010019.F003
2-hop auxiliary index for node 1 in Fig. 2.
In our distributed model, each node is responsible for computing and updating the estimated trussness of its incident edges by maintaining trussMap. Algorithm 1 shows the execution process of node . The proposed algorithm has two phases: the initialization and the execution phases.
In the initialization phase (lines 1–8), the adjacency nodes of exchange their adjMap values with to compute trussMap. The initially estimated trussness of an edge is the support of the edge plus (line 6). sends its trussMap to the neighboring nodes and sets the Changed flag to false to end the process (lines 7 and 8).
The execution phase is the process of iterative execution of the entire distributed system (lines 9–22). In each round of the execution phase, receives the trussness of incident edges to neighbor nodes (line 11). Thus, each node can receive the trussness of the 2-hop incident edges. Afterward, updates the trussness of incident edges through the locality property by the function ComputeTruss (line 16). This function then returns the maximum for which a value not less than exists in the counter, and the algorithm is shown in Algorithm 2. If the updated estimated trussness of the incident edges is smaller than the value of the current record, then the value in trussMap is updated, and the Changed flag is set to true (lines 17–19). The updated trussMap is sent to the neighboring nodes of (lines 20 and 21) after all the incident edges of with their updated trussness are obtained. The steps of the execution phase are repeated until the termination condition is met and the system stops. Finally, the stored values in trussMap are the trussness of the edges.
Termination mechanism. Several alternatives for termination conditions of the distributed computing system are avaliable[
38
]. We use the barrier synchronization mechanism as the termination condition. If none of the nodes in the system updates trussMap in a round, that is, no active nodes are avaliable, then the system will stop, and the computation of trussness is complete.
Efficient message strategy. Two strategies that can further reduce the volume of passing messages are avaliable.
(1) The support of an edge is determined by the number of triangles where the edge is contained. Therefore, some edges may not be in any triangle and hence will not participate in any computing. Thus, the trussness of these edges will not be stored in the trussMap.
(2) The trussMap maintained by each node stores the estimated trussness of all incident edges. In the phase of sending messages in each round, the entire trussMap is sent to the neighbor nodes. This approach is substantially inefficient because only some items in trussMap may have changed. Theregore, we only need to send the updated items in trussMap to the neighboring nodes.
We then analyze the accuracy and efficiency of Algorithm 2.
Lemma 1. In Algorithm 1, the estimated trussness of is always higher than .
Proof. Assuming and , at least triangles contain , and the trussness of each edge in these triangles is no less than according to the definition of -truss. Assume is an edge that forms a triangle with , , and at time . This assumption indicates that the endpoint or updates , which causes at time . By analogy, the endpoints of update some estimated trussness of the edges that form the triangles with at . The inference leads to an infinite sequence of . However, this sequence has a loop that yields , which is a contradiction.
■
Lemma 2. Algorithm 1 ensures the convergence of the estimated trussness to for .
Proof. In any round during the entire process, will only decrease and will be no less than by Lemma 1. Therefore, we only need to prove that will eventually be equal to for any edge . We set in the initialization phase and provide proof by induction on the values of .
• . In this case, , that is, is not in any triangles. in the beginning of the distributed system.
• . Assume that always holds. Therefore, each edge in at least two triangles has trussness no less than . Therefore, a subgraph that contains these edges is avaliable, and the support of each edge is larger than . The subgraph is a -truss, which is a contradiction.
• Induction step: suppose there is an edge such that and forever constantly. Then, triangles contain and for each such that . According to Lemma 1, by .
If , then will eventually decrease to according to Property 1.
If , then is in at least triangles and such that . Therefore, according to the definition of -truss, which is a contradiction.
■
Theorem 1. The number of rounds of Algorithm 1 is bound by .
Proof. As long as this distributed system is running, at least one edge will perform the update operation of estimated trussness. In the worst case, only one edge of the estimated trussness is updated in each round, and this edge only changes by 1. For an edge , the quantity by which its estimated trussness decreases from its initial value to its final value is . This quantity represents the update error of . Thus, the execution time is bound by the sum of update errors of all edges, which is .
■
From the above lemmas and theorem, we derive the following Theorems regearding the time and information complexities of the distributed truss decomposition algorithm.
Theorem 2. Given a graph , the time complexity of Algorithm 1 to compute the trussness of edges is , where is the number of edges in and is the minimum trussness of the graph.
Proof. As the system is running, at least one edge whose estimated trussness will be updated in each round exists. This condition indicates that at least one minimum trussness must be determined in each round. After the trussness of all edges has been determined, the system requires one round of initialization and one round of deterministic termination. Hence, the number of rounds of the algorithm is no larger than , and the time complexity is .
■
Theorem 3 . Given a graph , the message complexity of Algorithm 1 to compute the trussness of edges is .
Proof. In the worst case, one of the endpoints of an edge receives at most messages from the neighboring nodes. In whatever graph, the minimum value of trussness of edges is . Therefore, the number of messages of the algorithm is bound by , and the message complexity is .
■
10.26599/TST.2022.9010019.F004
An example of promoter nodes and spread nodes.
One edge dynamic
We first introduce the properties related to the trussness update of the insertion/deletion of an edge. The distributed algorithms for truss maintenance are then presented.
At most, one triangle is formed for destroyed after the insertion/deletion of an edge. The support of an edge is the number of triangles contained in that edge, and the trussness of an edge is related to the minimum support of the surrounding edges. The insertion or deletion of an edge may cause an increase or decrease in the trussness of other edges by at most 1, which has been proven in Ref. [
15
]. A new graph is formally generated after inserting in G. The increase in the trussness of edge in may be performed in two ways: (1) , , and another edge form a new triangle, which increases the support of ; (2) the trussness of some edges of the triangle in which is located increases by one. After deleting in , the trussness of edge in the newly generated graph may decrease in two ways: (1) and belong to the same triangle in , and the deletion of reduces the support of ; (2) the trussness of some edges in the triangle where is located decreases by one.
For the trussness of the inserted edge , we set the bounds for based on the trussness of the edges before the insertion of . Assume is the lower bound of and is the upper bound of . According to the definition of -truss,
where . The trussness of edges forming triangles with will increase by at most . Therefore, we can deduce that
The above can be formally summarized as the following property[
37
].
Property. If inserting edge to , then may increase trussness by , where and is -triangle connected with . If deleting edge from , then may decrease trussness by , where and is -triangle connected with .
An example of inserting edges is shown in
Fig. 2
, which demonstrates the formation of new triangles after edge insertion. The trussness of part of the edges may increase due to the new triangles.
After an edge is inserted, the two endpoints and initially update their adjMap and send the updated adjMap to their neighbors. Then, the two nodes both calculate the upper bound of (i.e., ). An edge will be marked if forms a triangle with and , and these edges will also be marked if they are -triangle connected with . All marked edges are potential edges whose trussness may increase by . Therefore, we increase the trussness estimates in the trussMap of these potential edges by , and perform the same process as Algorithm 1 for the system until the convergence of the trussness estimates of all edges to the final stable values.
We classify the endpoints of edges that may update trussness into two categories. (1) The nodes are the endpoints of the inserted edges, denoted by promoter node set; (2) the nodes are the endpoints of those edges that are -triangle connected to the inserted edges, denoted by spread node set. Each node in the promoter node set is a promoter node, and each node in the spread node set is a spread node.
The promoter nodes are responsible for computing the upper bound trussness of inserting edges and recording edges that may be affected. Meanwhile, the spread nodes are responsible for sending messages to the nodes where the requirements are met. We first describe the distributed algorithm for inserting an edge, and then briefly discuss the algorithm for deleting one edge.
Figure 4
shows an example of promoter nodes and spread nodes. After inserting edge in the graph, nodes and become promoter nodes, and the initial trussness of the newly inserted edge is . Neighboring nodes connected to nodes and by trussness not larger than become spread nodes.
10.26599/TST.2022.9010019.F004
An example of promoter nodes and spread nodes.
Distributed truss maintenance with one-edge insertion. In our algorithm, each node maintains two auxiliary sets: updated edge set (UES) is a set that stores the neighboring edges of the node, and the trussness of these neighboring edges may change; updated node set (UNS) is a set of nodes that stores the nodes adjacent to this node and the trussMap of these nodes may change.
For the promoter node, the edges stored in its UES are all the edges whose trussness is lower than the upper bound of the inserted edge’s trussness; its UNS stores another endpoint of those edges in its UES. The edges stored in UES are those whose trussness may increase by . We reinitialize the trussness estimates of these edges to their current trussness plus . All nodes in the system perform the same process as in Algorithm 1 after the reinitialization is completed. The initialization pseudocodes of the promoter and spread nodes are shown in Algorithms 3 and 4, respectively.
For a promoter node , the procedure is divided into two parts, prepare phase (lines 1–11 in Algorithm 3) and propose phase (lines 12–16 in Algorithm 3). Due to the insertion of a new edge , the adjMap of is first updated, and then compute the upper bound of (lines 2–5 in Algorithm 3). Each edge that forms a triangle with is then traversed: if the edge satisfies the condition for possible updates, then the edge is added to , and another endpoint of the edge is added to (lines 6–11 in Algorithm 3). In the Propose phase, the trussness of each edge in plus one, and each node in spreads the Change signal (lines 12–16 in Algorithm 3).
For a spread node , the procedure is still divided into two parts. In prepare phase, the edge whose trussness may change (lines 3 and 4 in Algorithm 4) is first determined. Each edge that forms a triangle with the potential edge and has the same trussness as the potential edge is then added to the (lines 5–11 in Algorithm 4). The propose phase of the spread node is the same as the promoter node, and the difference lies in the additional trussness of edges by 1 that no longer changes.
Analysis. The reinitialization stage ends when no node receives UES. The time complexity of the re-initialization stage depends on the diameter of the induced subgraph generated by UNS. The induced graph generated by UNS is denoted as , which is a -truss. The diameter of a connected -truss with nodes is not greater than [
8
]. As shown in Ref. [
9
], for a graph and a node set , we have , where is the length of the shortest path between and in and is the diameter of graph . The time complexity of reinitialization stage is . Algorithm 1 will be terminated for two rounds of the execution phase in most cases (one round ensures that all nodes will not update trussMap) after resetting the estimated trussness of edges.
Distributed truss maintenance with one-edge deletion. The algorithm for a single edge deletion is simpler than the insertion case. After an edge is deleted, the two endpoints of this edge first update their adjMap and remove the nonexistent items from the trussMap due to the change in the graph structure. The execution phase of Algorithm 1 is then performed, and the estimated trussness of edges will converge to the real trussness. In this case, only nodes in UNS will participate in the calculation, and only a few rounds will be needed in most cases.
We do not need to label nodes in simliar to the insertion scenario due to the absence of potential edges to determine. Hence, the distributed truss maintenance algorithm for edge deletion is efficient and fast.
One edge dynamic
We first introduce the properties related to the trussness update of the insertion/deletion of an edge. The distributed algorithms for truss maintenance are then presented.
At most, one triangle is formed for destroyed after the insertion/deletion of an edge. The support of an edge is the number of triangles contained in that edge, and the trussness of an edge is related to the minimum support of the surrounding edges. The insertion or deletion of an edge may cause an increase or decrease in the trussness of other edges by at most 1, which has been proven in Ref. [
15
]. A new graph is formally generated after inserting in G. The increase in the trussness of edge in may be performed in two ways: (1) , , and another edge form a new triangle, which increases the support of ; (2) the trussness of some edges of the triangle in which is located increases by one. After deleting in , the trussness of edge in the newly generated graph may decrease in two ways: (1) and belong to the same triangle in , and the deletion of reduces the support of ; (2) the trussness of some edges in the triangle where is located decreases by one.
For the trussness of the inserted edge , we set the bounds for based on the trussness of the edges before the insertion of . Assume is the lower bound of and is the upper bound of . According to the definition of -truss,
where . The trussness of edges forming triangles with will increase by at most . Therefore, we can deduce that
The above can be formally summarized as the following property[
37
].
Property. If inserting edge to , then may increase trussness by , where and is -triangle connected with . If deleting edge from , then may decrease trussness by , where and is -triangle connected with .
An example of inserting edges is shown in
Fig. 2
, which demonstrates the formation of new triangles after edge insertion. The trussness of part of the edges may increase due to the new triangles.
After an edge is inserted, the two endpoints and initially update their adjMap and send the updated adjMap to their neighbors. Then, the two nodes both calculate the upper bound of (i.e., ). An edge will be marked if forms a triangle with and , and these edges will also be marked if they are -triangle connected with . All marked edges are potential edges whose trussness may increase by . Therefore, we increase the trussness estimates in the trussMap of these potential edges by , and perform the same process as Algorithm 1 for the system until the convergence of the trussness estimates of all edges to the final stable values.
We classify the endpoints of edges that may update trussness into two categories. (1) The nodes are the endpoints of the inserted edges, denoted by promoter node set; (2) the nodes are the endpoints of those edges that are -triangle connected to the inserted edges, denoted by spread node set. Each node in the promoter node set is a promoter node, and each node in the spread node set is a spread node.
The promoter nodes are responsible for computing the upper bound trussness of inserting edges and recording edges that may be affected. Meanwhile, the spread nodes are responsible for sending messages to the nodes where the requirements are met. We first describe the distributed algorithm for inserting an edge, and then briefly discuss the algorithm for deleting one edge.
Figure 4
shows an example of promoter nodes and spread nodes. After inserting edge in the graph, nodes and become promoter nodes, and the initial trussness of the newly inserted edge is . Neighboring nodes connected to nodes and by trussness not larger than become spread nodes.
10.26599/TST.2022.9010019.F004
An example of promoter nodes and spread nodes.
Distributed truss maintenance with one-edge insertion. In our algorithm, each node maintains two auxiliary sets: updated edge set (UES) is a set that stores the neighboring edges of the node, and the trussness of these neighboring edges may change; updated node set (UNS) is a set of nodes that stores the nodes adjacent to this node and the trussMap of these nodes may change.
For the promoter node, the edges stored in its UES are all the edges whose trussness is lower than the upper bound of the inserted edge’s trussness; its UNS stores another endpoint of those edges in its UES. The edges stored in UES are those whose trussness may increase by . We reinitialize the trussness estimates of these edges to their current trussness plus . All nodes in the system perform the same process as in Algorithm 1 after the reinitialization is completed. The initialization pseudocodes of the promoter and spread nodes are shown in Algorithms 3 and 4, respectively.
For a promoter node , the procedure is divided into two parts, prepare phase (lines 1–11 in Algorithm 3) and propose phase (lines 12–16 in Algorithm 3). Due to the insertion of a new edge , the adjMap of is first updated, and then compute the upper bound of (lines 2–5 in Algorithm 3). Each edge that forms a triangle with is then traversed: if the edge satisfies the condition for possible updates, then the edge is added to , and another endpoint of the edge is added to (lines 6–11 in Algorithm 3). In the Propose phase, the trussness of each edge in plus one, and each node in spreads the Change signal (lines 12–16 in Algorithm 3).
For a spread node , the procedure is still divided into two parts. In prepare phase, the edge whose trussness may change (lines 3 and 4 in Algorithm 4) is first determined. Each edge that forms a triangle with the potential edge and has the same trussness as the potential edge is then added to the (lines 5–11 in Algorithm 4). The propose phase of the spread node is the same as the promoter node, and the difference lies in the additional trussness of edges by 1 that no longer changes.
Analysis. The reinitialization stage ends when no node receives UES. The time complexity of the re-initialization stage depends on the diameter of the induced subgraph generated by UNS. The induced graph generated by UNS is denoted as , which is a -truss. The diameter of a connected -truss with nodes is not greater than [
8
]. As shown in Ref. [
9
], for a graph and a node set , we have , where is the length of the shortest path between and in and is the diameter of graph . The time complexity of reinitialization stage is . Algorithm 1 will be terminated for two rounds of the execution phase in most cases (one round ensures that all nodes will not update trussMap) after resetting the estimated trussness of edges.
Distributed truss maintenance with one-edge deletion. The algorithm for a single edge deletion is simpler than the insertion case. After an edge is deleted, the two endpoints of this edge first update their adjMap and remove the nonexistent items from the trussMap due to the change in the graph structure. The execution phase of Algorithm 1 is then performed, and the estimated trussness of edges will converge to the real trussness. In this case, only nodes in UNS will participate in the calculation, and only a few rounds will be needed in most cases.
We do not need to label nodes in simliar to the insertion scenario due to the absence of potential edges to determine. Hence, the distributed truss maintenance algorithm for edge deletion is efficient and fast.
Dynamics of multiple edges
In this section, we study the truss maintenance after the insertion/deletion of multiple edges. The main challenges of maintaining trussness in the case of multiple edges dynamics remain as follows: determining which edges change in trussness and by how much. Algorithms for the case of a single edge dynamic can be used iteratively when multiple edges are inserted/deleted. However, we can further optimize the algorithms when inserting/removing multiple edges for efficient computation.
Insertion of multiple edges. Algorithm 5 shows the procedure of truss maintenance after the insertion of multiple edges. Suppose the set of inserted edges is . For each edge , Algorithms 3 and 4 are first executed after is inserted (lines 1–5). After all edges in are inserted, the execution phase of Algorithm 1 is executed until termination (line 6).
Deletion of multiple edges. Algorithm 5 shows the procedure of truss maintenance after the deletion of multiple edges. Suppose the set of deleted edges is . The adjMap and trussMap of all nodes are directly updated after deleting all the edges in . That is, each node deletes the nonexistent neighbors, and then sends the adjacency list to the remaining neighbors. Afterward, each node deletes the item whose key does not exist in the trussMap and then sends the trussMap of incident edges to the neighbors. Finally, nodes perform the execution phase of Algorithm 1 until termination. In the case of deleting numerous edges, the endpoints of each deleted edge can recalculate the value of of the incident edges. An endpoint updates if is an incident edge of . Given a large number of deleted edges and broken triangles, the value of for may even be smaller than . Storing the smaller values in trussMap enables the rapid termination of the algorithm.
Analysis. The analysis of previous sections indicate that the time complexity of single edge insertion is . Thus the time complexity for multiple edges insertion is , where is the diameter of .
10.26599/TST.2022.9010019.F005
Proposed method is compared with other methods in truss decomposition.
10.26599/TST.2022.9010019.F006
Number of rounds after inserting edges in datasets.
10.26599/TST.2022.9010019.F007
Number of rounds after deleting edges in datasets.
10.26599/TST.2022.9010019.F008
Number of messages after inserting edges in datasets.
10.26599/TST.2022.9010019.F009
Number of messages after deleting edges in datasets.
10.26599/TST.2022.9010019.F010
Percentage of activated nodes after inserting edges in datasets.
10.26599/TST.2022.9010019.T001Notations and descriptions.
Notation
Description
Graph with node set and edge set
Edge with and as endpoints
Triangle with three nodes , , and
Set of neighbors of in
Set of neighboring edges of in
Incident edges of in
Support of in
Trussness of in
Estimate trussness of
10.26599/TST.2022.9010019.T002Statistics of datasets. is the number of nodes, is the number of edges, is the number of triangles of the graph, and is the average degree.