School of Computer Science and Technology, Shandong University, Qingdao 266237, China
Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
Show Author Information
Hide Author Information
Abstract
The prevalence of graph data has brought a lot of attention to cohesive and dense subgraph mining. In contrast with the large number of indexes proposed to help mine dense subgraphs in general graphs, only very few indexes are proposed for the same in bipartite graphs. In this work, we present the index called -core number on vertices, which reflects the maximal cohesive and dense subgraph a vertex can be in, to help enumerate the -cores, a commonly used dense structure in bipartite graphs. To address the problem of extremely high time and space cost for enumerating the -cores, we first present a linear time and space algorithm for computing the -core numbers of vertices. We further propose core maintenance algorithms, to update the core numbers of vertices when a graph changes by avoiding recalculations. Experimental results on different real-world and synthetic datasets demonstrate the effectiveness and efficiency of our algorithms.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
A.Beutel, W.Xu, V.Guruswami, C.Palow, and C.Faloutsos, Copycatch: Stopping group attacks by spotting lockstep behavior in social networks, inProc. 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 2013, pp. 119–130.
M.Kaytoue, S. O.Kuznetsov, A.Napoli, and S.Duplessis, Mining gene expression data with pattern structures in formal concept analysis, Inf. Sci., vol. 181, no. 10, pp. 1989–2001, 2011.
J.Wang, A. P.de Vries, and M. J. T.Reinders, Unifying user-based and item-based collaborative filtering approaches by similarity fusion, inProc. 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, USA, 2006, pp. 501–508.
D.Ding, H.Li, Z.Huang, and N.Mamoulis, Efficient fault-tolerant group recommendation using alpha-beta-core, inProc. 2017 ACM on Conference on Information and Knowledge Management, Singapore, 2017, pp. 2047–2050.
Z.Cai, Z.He, X.Guan, and Y.Li, Collective data-sanitization for preventing sensitive information inference attacks in social networks, IEEE Trans. Dependable Secur. Comput., vol. 15, no. 4, pp. 577–590, 2018.
K.Li, G.Lu, G.Luo, and Z.Cai, Seed-free graph de-anonymiztiation with adversarial learning, inProc. 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, 2020, pp. 745–754.
K.Li, G.Luo, Y.Ye, W.Li, S.Ji, and Z.Cai, Adversarial privacy-preserving graph embedding against inference attack, IEEE Internet Things J., vol. 8, no. 8, pp. 6904–6915, 2021.
K.Xu, R.Williams, S. H.Hong, Q.Liu, and J.Zhang, Semi-bipartite graph visualization for gene ontology networks, inProc. 17th international conference on Graph Drawing, Chicago, IL, USA, 2009, pp. 244–255.
L.Yu, B.Wu, and B. Wang. LBLP: Link-clustering-based approach for overlapping community detection, Tsinghua Science and Technology, vol. 4, pp. 387–397, 2013.
U.Feige, S.Goldwasser, L.Lovasz, S.Safra, and M.Szegedy, Approximating clique is almost NP-complete (preliminary version), inProc. 32nd Annual Symposium on Foundations of Computer Science, San Juan, PR, USA, 1991, pp. 2–12.
Q.Luo, D.Yu, F.Li, Z.Dou, Z.Cai, J.Yu, and X.Cheng, Distributed core decomposition in probabilistic graphs, inProc. 8th International Conference on Computational Data and Social Networks, Ho Chi Minh City, Vietnam, 2019, pp. 16–32.
D.Yu, L.Zhang, Q.Luo, X.Cheng, J.Yu, and Z.Cai, Fast skyline community search in multi-valued networks, Big Data Mining Analytics, vol. 3, no. 3, pp. 171–180, 2020.
P.Chen, C.Chou, and M.Chen, Distributed algorithms for -truss decomposition, inProc. 2014 IEEE International Conference on Big Data, Washington, DC, USA, 2014, pp. 471–480.
B.Balasundaram, S.Butenko, and I. V.Hicks, Clique relaxations in social network analysis: The maximum -plex problem, Oper. Res., vol. 59, no. 1, pp. 133–142, 2011.
M.Bentert, A. -S.Himmel, H.Molter, M.Morik, R.Niedermeier, and R.Saitenmacher, Listing all maximal -plexes in temporal graphs, inProc. 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Barcelona, Spain, 2018, pp. 41–46.
A.Ahmed, V.Batagelj, X.Fu, S.Hong, D.Merrick, and A.Mrvar, Visualisation and analysis of the internet movie database, inProc. 2007 6th International Asia-Pacific Symposium on Visualization, Sydney, Australia, 2007, pp.17–24.
K.Sim, J.Li, V.Gopalkrishnan, and G.Liu, Mining maximal quasi-bicliques to co-cluster stocks and financial ratios for value investment, inProc. 6th IEEE International Conference on Data Mining (ICDM 2006), Hong Kong, China, 2006, pp. 1059–1063.
A. E.Sarıyüce, B.Gedik, G.Jacques-Silva, K. -L.Wu, and Ü. V.Çatalyürek, Incremental -core decomposition: Algorithms and evaluation, VLDB J., vol. 25, no. 3, pp. 425–447, 2016.
A. E.Sariyüce, C.Seshadhri, A.Pinar, and Ü. V.Catalyurek, Finding the hierarchy of dense subgraphs using nucleus decompositions, inProc. 24th International Conference on World Wide Web, Florence, Italy, 2015, pp. 927–937.
H.Aksu, M.Canim, Y. -C.Chang, I.Korpeoglu, and Ö.Ulusoy, Distributed -core view materialization and maintenance for large dynamic graphs, IEEE Trans. Knowl. Data Eng., vol. 26, no. 10, pp. 2439–2452, 2014.
S.Aridhi, M.Brugnara, A.Montresor, and Y.Velegrakis, Distributed -core decomposition and maintenance in large dynamic graphs, inProc. 10th ACM International Conference on Distributed and Event-Based Systems, Irvine, CA, USA, 2016, pp. 161–168.
W.Zhou, H.Huang, Q. -S.Hua, D.Yu, H.Jin, and X.Fu, Core decomposition and maintenance in weighted graph, World Wide Web, vol. 24, no. 2, pp. 541–561, 2021.
Q.Luo, D.Yu, Z.Cai, X.Lin, and X.Cheng, Hypercore maintenance in dynamic hypergraphs, inProc. 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, 2021, pp. 2051–2056.
Yu D, Zhang L, Luo Q, et al. Core Decomposition and Maintenance in Bipartite Graphs. Tsinghua Science and Technology, 2023, 28(2): 292-309. https://doi.org/10.26599/TST.2021.9010091
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
10.26599/TST.2021.9010091.F001
Change in the core number of each vertex when an edge is inserted.
Core Decomposition
In this section, we first solve the core decomposition problem. Specifically, by adapting the algorithm for -core computation given in Ref. [
9
], we present an algorithm for core decomposition in bipartite graphs in Algorithm 1.
For a bipartite graph , the value of ranges from 1 to , since the value represents the degrees of vertices in (Line 1). For each , we calculate the corresponding for all vertices. For each fixed , the vertices in whose degrees are less than are removed along with their adjacent edges. The core numbers of these vertices are set as 0 (Lines 3–5). Then we need to compute the core numbers of rest vertices in and , respectively.
In the remaining graph , the vertices in are handled in the order of increasing degrees. Let the minimum degree in be , all vertices in whose degrees are not greater than and their adjacent edges are deleted. The core numbers of these vertices are set as (Lines 7–10). For the vertices in , once there is a vertex whose degree is less than , then we set = and remove and its adjacent edges from , since cannot be in any -core with (Lines 11–13). When becomes empty, i.e., all the vertices have been processed, then we obtain the core number of each vertex under the fixed . While traversing through each possible value of , the core number of each vertex under each possible can be computed.
The correctness of Algorithm 1 can be easily obtained by the definition of -core number. We next show the efficiency of the algorithm.
Theorem 1 Given a bipartite graph , the time needed for core decomposition is and the space required to store the core numbers of vertices is , where , , and are the total number of vertices, the total number of edges, and the smaller value between maximum degree of vertices in and maximum degree of vertices in , respectively.
Proof Each vertex and each edge only need to be traversed once, and hence the time for core decomposition is . To be more efficient, we can interchange and when , then can be at most and we will get the time complexity of the algorithm. As for the space required, for each vertex we need to store core numbers for at most values, and hence the total space used is .
In reality, the graph may change over time, hence the core numbers of vertices have to be updated after the graph changes. Intuitively, we can execute Algorithm 1 to recompute the core numbers of all vertices. However, though the computation time of core numbers for each fixed is only linear, it is still unacceptable considering that the graph may have billions of vertices and edges. Hence, we next turn our attention to the core maintenance problem, i.e., updating the core numbers of vertices by avoiding recomputations. We will first present the scenario of a single edge insertion/deletion, and then propose a batch processing approach to further improve the efficiency of core maintenance.
Incremental core maintenance
Algorithm. The detailed algorithm to maintain the core number of each vertex after inserting an edge is given in Algorithm 2. After inserting into , it iterates over all values of from 1 to , since the largest value of that can have in an -core is its own degree (Lines 1 and 2). We set and to preserve the vertices to be checked and the vertices whose core numbers might increase, respectively (Line 3). To prevent repetitive processing of a vertex, we set to mark the vertices that have been visited (Line 4). For each possible , we aim to find the vertices in the graph whose core numbers change after inserting an edge and then increase their core numbers by 1.
According to Lemma 1, we need to compute the pre-, to make sure that changes by at most 1 (Line 5). We will set be the vertex with the smaller core number between and and add it to (Lines 6–8). When a vertex is ejected from , we insert it to and set as true (Lines 9–11). For each vertex in , we are going to check its each neighbor , to see if ’s core number is likely to increase. If or is in , then must support the increase of , because is already in an -core or is supposed to increase from to . Besides that, if and is not visited, then is also likely to help to increase. So when the neighbor of satisfies one of these two conditions, increases by 1 (Lines 12–16). When is larger than the threshold , it means that it is possible for to increase. Based on Lemma 2, each neighbor of with = and not visited is pushed into for further checking (Lines 17 and 18). Once is not enough to support the increase in the core number of , the procedure DfsDelete, which is DFS process, is invoked to delete the vertices from that cannot increase their core numbers (Lines 19 and 20). While becomes empty, we have dealt with all the vertices whose core number might change. Finally, all the vertices in that have not been deleted will increase their core numbers by 1 (Lines 21 and 22).
The procedure DfsDelete removes the vertex from , since it does not satisfy the increasing condition (Line 25). For each neighbor of , if it is in and , we will reduce the by 1, since has lost a neighbor to support its core number increase (Lines 26–28). When does not have enough neighbors to hold its core number increase, DfsDelete will be invoked again to recursively remove (Lines 29 and 30).
Performance analysis. We will analyze the correctness and efficiency of Algorithm 2. Firstly, some notations are defined, which will be used in measuring the time complexity of the algorithm.
When an edge is inserted into the graph , for each , let and be the set of vertices whose core numbers are equal to in and , respectively, where . Let and . is the set of edges in the subgraph induced by to . Let .
Let , where denotes the SN value of when is first processed. Let = . Similarly, we define and .
Theorem 2 When inserting an edge into graph , Algorithm 2 can correctly update the core numbers of all vertices and the time complexity is .
Proof For each , Algorithm 2 first finds the vertices whose core numbers change and then increases their core numbers by 1. In the algorithm, the core numbers of and are compared and the one with smaller core number is labeled as . The algorithm traverses from and pushes all vertices with = that are reachable from through a -path to for further identification. According to Lemma 2, only those vertices may change their core numbers.
If = = and increases after insertion, then must satisfy one of the following conditions: () has a new neighbor whose core numbers are at least . () Some neighbors of have their core numbers increased from to . In the algorithm, records the number of these neighbors that support ’s core number increase. If , it is obvious that ’s core number will not increase. Once a vertex is identified not to change its core number, the SN values of other vertices are updated by invoking DfsDelete. After all the vertices whose core numbers might change have been processed, the subgraph induced by the vertices in that are not deleted constitutes an -core, as the SN value of these vertices is at least . Then by Lemma 1, all these vertices will increase the core number by 1. Hence, the algorithm can correctly update the core numbers of vertices.
Next, we analyze the time complexity. Firstly, there are iterations in the algorithm execution. For each iteration, since only the vertices with are reachable from through a -path, and will be processed, so the number of vertices and edges that the algorithm visits are at most and , respectively. For each vertex, except for the first calculation of the SN value when ejected from , the subsequent visits will decrease its SN value by 1. Therefore, the vertex can be visited at most () times for a certain , since it will be removed when its is smaller than . Then we can see that the vertex will be visited at most () times in any iteration. To sum up, the time complexity of Algorithm 2 can be obtained.
Example 2 Here is an example to illustrate the execution of Algorithm 2 in
Fig. 1
at . We first insert into the graph and compute pre- and . Since , without loss of generality, let . We push into , and then set to avoid repeated visits. We can easily get that , because the core numbers of its neighbors and are both 3 and not visited. Since and , we will next push and to . As for , it can be obtained that , so will not be deleted, and we are going to process ’s neighbors. When all the vertices in are processed, we find that the vertices that are still in are . As a result, we increase their core numbers by 1, to give the value 4 and Algorithm 2 terminates.
Decremental core maintenance
Similar to Algorithm 2 of core maintainance after single-edge insertion, we here give the core maintenance algorithm after deleting an edge.
Algorithm. The detailed algorithm to update the core number of each vertex after removing an edge is given in Algorithm 3. After deleting from , it iterates over all possible values of from 1 to , since the largest value of can affect the -core while deleting edge is the degree of in (Lines 1 and 2).
In each iteration of , the algorithm process is similar to Algorithm 2. The difference is that it compares and to determine instead of computing pre- (Lines 5–7). When dealing with each vertex in , it computes value by counting ’s neighbors whose core numbers are not less than and are not in (Lines 8–12). Once is not enough to keep the current core number of , the procedure DfsUpdate is invoked to record the vertices whose core numbers are going to decrease (Lines 13 and 14). Finally, all the vertices in that have not been deleted will decrease their core numbers by 1 (Lines 15 and 16). Specifically, if after deletion is greater than pre-, we will set the as pre-, since the is at most pre-core() according to Definition 3 (Lines 17 and 18).
The DfsUpdate first adds to and then checks its each neighbor (Lines 20 and 21). For those vertices with , we will discuss them in two cases. If is not visited, then it will be pushed into for further detection (Line 22). Otherwise, if it has been visited but not in , we decrease by 1. Once is less than or , the DfsUpdate will be recursively called (Lines 23–26).
Performance analysis. We will analyze the correctness and efficiency of Algorithm 3. Similar to Algorithm 2, we first give some useful notations.
Given a graph , we delete an edge from it. For each , let and be the set of vertices whose core numbers are equal to in and , respectively, where . Let and . Let be the set of edges in the subgraph induced by to . Let .
Let , where denotes the SN value of when is first processed. Let . Similarly, we define and .
Using the notations above and an analysis similar to the single-edge insertion case, we can easily obtain the following theorem.
Theorem 3 When deleting an edge from graph , Algorithm 3 can update the core numbers of vertices in time.
10.26599/TST.2021.9010091.F002
An example for V-independent edge set.
Incremental core maintenance
Algorithm. The detailed algorithm to maintain each vertex’s core number with the insertion of multiple edges is given in Algorithm 4. It is executed until all edges in have been processed (Line 1). In each iteration, the algorithm calls the subroutine FindInsertEdges to find a -independent edge set and deletes it from (Lines 2–4). For each possible , the InsertChangedSet algorithm is invoked to find the vertices whose core numbers will increase by 1 after insertion and change their core numbers (Lines 5–9).
Algorithm 5 aims to find a -independent edge set from all unprocessed edges in . Basically, it sets two parameters, marking the connection between a node in and the selected edges, and flag indicating whether an edge is selected or not (Lines 2–4). For each endpoint of edge in , there are two cases for the algorithm execution. When is not in , then it checks if ’s neighbors and are already connected to other vertices in (Lines 7 and 8). Once any vertex is found whose is not 0, the algorithm will set flag to false indicating this edge cannot be selected (Lines 9–11). Similarly, when has been in , the algorithm will check if is already connected to other vertices in (Lines 12 and 13). When flag is true, the edge will be added into (Lines 14 and 15). If is not in , it will add to and update for each neighbor of and (Lines 16–18).
After the VES is found, Algorithm 6 is invoked to find all vertices whose core numbers will change after insertion. Similar to Algorithm 2, we first do the initialization (Lines 1–6). For each edge to be inserted, it computes the pre-core of and then adds the endpoint with the smaller core number to (Lines 7–12). When a vertex is ejected from , the algorithm inserts it into and sets as true (Lines 14–16). For each vertex in , each of its neighbors is checked to see if ’s core number is likely to increase. The calculation of SN value is the same as the case of single-edge insertion (Lines 17–22). If reaches the threshold, i.e., is greater than when in or is not less than when in , then each neighbor of which satisfies = and is not visited is pushed into for further checking (Lines 23 and 24). Otherwise, the procedure DfsDelete is invoked to remove from , since it is impossible for to increase (Line 25).
Performance analysis. To analyze the time complexity of the algorithm, we first provide some useful notations. Let be the maximum number of edges inserted for each vertex in . By the definition of VES, it is easy to see that the inserted edges can be divided into -independent edge sets. When a VES is inserted into graph , the maximum degree of all vertices in is denoted by and . For each , let and be the set of vertices whose core numbers are equal to in and , respectively, where and . Let and . Let be the set of edges in the subgraph induced by to , and let .
Let = , where denotes the SN value of when is processed for the first time. Let = . Similarly, we define = and = .
The following theorem gives the time complexity of Algorithm 4 and proves its correctness.
Theorem 4 Algorithm 4 can correctly update the core numbers of all vertices after inserting an edge set in time.
Proof Algorithm 4 is executed in iterations and each iteration includes two parts. The first part finds a -independent edge set from all unprocessed edges in by executing Algorithm 5. According to Lemma 3, each vertex can change its core number by 1 after inserting a -independent edge set into the graph.
As for the second part, we invoke Algorithm 6 to identify the vertices whose core number can increase after insertion for each . It starts from the inserted edges and adds each endpoint with the smaller core number to for further identification. Similar to Algorithm 2, the algorithm uses to record the number of neighbors that support ’s core number increase. If , it is obvious that ’s core number will not increase. Once a vertex is identified not to change its core number, the SN values of other vertices are updated by invoking DfsDelete. After all the vertices whose core numbers might change have been processed, the vertices in that are not deleted will increase their core numbers, as these vertices have enough neighbors to support them in an -core with a larger value. When all edges in are handled, the potential vertices are visited and the ones that cannot increase core numbers are removed. All of the above ensure the correctness of Algorithm 4.
As for the time complexity, it is similar to the case of single-edge insertion. The difference is that Algorithm 4 has to deal with multiple edges in batches. For each batch, the algorithm needs to iterate at most times. Based on Lemma 2, when inserting an edge (assuming ), only the vertices with = are reachable from through a -path, and can change their core numbers. Therefore, when inserting a VES into the graph, the number of vertices and edges visited by Algorithm 4 is at most and , respectively. For each vertex, except for the first SN value calculation when ejected from , all the subsequent visits will decrease its SN value by 1. So for the vertex , it can be visited by at most times, since it will be removed from when its SN is smaller than . Then we can know that the vertex will be visited at most () times in any iteration. Therefore, it can be concluded that the time complexity of the Algorithm 4 is the same as stated in Theorem 4.
Discussion. A point we would like to emphasize is that the batch processing algorithm can greatly reduce redundant computations. This is because when processing the insertion of a , once a vertex is determined to increase its core number, it is unnecessary to visit this vertex again, as it will not increase the core number anymore. We illustrate this observation with the graph in
Fig. 1
. Assuming that is the edge set that needs to deal with. If we insert these edges one by one using Algorithm 2, then we need to visit vertex three times, vertex and two times, and the rest vertices once. However, since the satisfies the conditions of -independent edge set, so we can process these edges using one iteration and visit all vertices in and only once. Thus, the time consumed is significantly reduced.
Incremental core maintenance
Algorithm. The detailed algorithm to maintain each vertex’s core number with the insertion of multiple edges is given in Algorithm 4. It is executed until all edges in have been processed (Line 1). In each iteration, the algorithm calls the subroutine FindInsertEdges to find a -independent edge set and deletes it from (Lines 2–4). For each possible , the InsertChangedSet algorithm is invoked to find the vertices whose core numbers will increase by 1 after insertion and change their core numbers (Lines 5–9).
Algorithm 5 aims to find a -independent edge set from all unprocessed edges in . Basically, it sets two parameters, marking the connection between a node in and the selected edges, and flag indicating whether an edge is selected or not (Lines 2–4). For each endpoint of edge in , there are two cases for the algorithm execution. When is not in , then it checks if ’s neighbors and are already connected to other vertices in (Lines 7 and 8). Once any vertex is found whose is not 0, the algorithm will set flag to false indicating this edge cannot be selected (Lines 9–11). Similarly, when has been in , the algorithm will check if is already connected to other vertices in (Lines 12 and 13). When flag is true, the edge will be added into (Lines 14 and 15). If is not in , it will add to and update for each neighbor of and (Lines 16–18).
After the VES is found, Algorithm 6 is invoked to find all vertices whose core numbers will change after insertion. Similar to Algorithm 2, we first do the initialization (Lines 1–6). For each edge to be inserted, it computes the pre-core of and then adds the endpoint with the smaller core number to (Lines 7–12). When a vertex is ejected from , the algorithm inserts it into and sets as true (Lines 14–16). For each vertex in , each of its neighbors is checked to see if ’s core number is likely to increase. The calculation of SN value is the same as the case of single-edge insertion (Lines 17–22). If reaches the threshold, i.e., is greater than when in or is not less than when in , then each neighbor of which satisfies = and is not visited is pushed into for further checking (Lines 23 and 24). Otherwise, the procedure DfsDelete is invoked to remove from , since it is impossible for to increase (Line 25).
Performance analysis. To analyze the time complexity of the algorithm, we first provide some useful notations. Let be the maximum number of edges inserted for each vertex in . By the definition of VES, it is easy to see that the inserted edges can be divided into -independent edge sets. When a VES is inserted into graph , the maximum degree of all vertices in is denoted by and . For each , let and be the set of vertices whose core numbers are equal to in and , respectively, where and . Let and . Let be the set of edges in the subgraph induced by to , and let .
Let = , where denotes the SN value of when is processed for the first time. Let = . Similarly, we define = and = .
The following theorem gives the time complexity of Algorithm 4 and proves its correctness.
Theorem 4 Algorithm 4 can correctly update the core numbers of all vertices after inserting an edge set in time.
Proof Algorithm 4 is executed in iterations and each iteration includes two parts. The first part finds a -independent edge set from all unprocessed edges in by executing Algorithm 5. According to Lemma 3, each vertex can change its core number by 1 after inserting a -independent edge set into the graph.
As for the second part, we invoke Algorithm 6 to identify the vertices whose core number can increase after insertion for each . It starts from the inserted edges and adds each endpoint with the smaller core number to for further identification. Similar to Algorithm 2, the algorithm uses to record the number of neighbors that support ’s core number increase. If , it is obvious that ’s core number will not increase. Once a vertex is identified not to change its core number, the SN values of other vertices are updated by invoking DfsDelete. After all the vertices whose core numbers might change have been processed, the vertices in that are not deleted will increase their core numbers, as these vertices have enough neighbors to support them in an -core with a larger value. When all edges in are handled, the potential vertices are visited and the ones that cannot increase core numbers are removed. All of the above ensure the correctness of Algorithm 4.
As for the time complexity, it is similar to the case of single-edge insertion. The difference is that Algorithm 4 has to deal with multiple edges in batches. For each batch, the algorithm needs to iterate at most times. Based on Lemma 2, when inserting an edge (assuming ), only the vertices with = are reachable from through a -path, and can change their core numbers. Therefore, when inserting a VES into the graph, the number of vertices and edges visited by Algorithm 4 is at most and , respectively. For each vertex, except for the first SN value calculation when ejected from , all the subsequent visits will decrease its SN value by 1. So for the vertex , it can be visited by at most times, since it will be removed from when its SN is smaller than . Then we can know that the vertex will be visited at most () times in any iteration. Therefore, it can be concluded that the time complexity of the Algorithm 4 is the same as stated in Theorem 4.
Discussion. A point we would like to emphasize is that the batch processing algorithm can greatly reduce redundant computations. This is because when processing the insertion of a , once a vertex is determined to increase its core number, it is unnecessary to visit this vertex again, as it will not increase the core number anymore. We illustrate this observation with the graph in
Fig. 1
. Assuming that is the edge set that needs to deal with. If we insert these edges one by one using Algorithm 2, then we need to visit vertex three times, vertex and two times, and the rest vertices once. However, since the satisfies the conditions of -independent edge set, so we can process these edges using one iteration and visit all vertices in and only once. Thus, the time consumed is significantly reduced.
Incremental core maintenance
Algorithm. The detailed algorithm to maintain each vertex’s core number with the insertion of multiple edges is given in Algorithm 4. It is executed until all edges in have been processed (Line 1). In each iteration, the algorithm calls the subroutine FindInsertEdges to find a -independent edge set and deletes it from (Lines 2–4). For each possible , the InsertChangedSet algorithm is invoked to find the vertices whose core numbers will increase by 1 after insertion and change their core numbers (Lines 5–9).
Algorithm 5 aims to find a -independent edge set from all unprocessed edges in . Basically, it sets two parameters, marking the connection between a node in and the selected edges, and flag indicating whether an edge is selected or not (Lines 2–4). For each endpoint of edge in , there are two cases for the algorithm execution. When is not in , then it checks if ’s neighbors and are already connected to other vertices in (Lines 7 and 8). Once any vertex is found whose is not 0, the algorithm will set flag to false indicating this edge cannot be selected (Lines 9–11). Similarly, when has been in , the algorithm will check if is already connected to other vertices in (Lines 12 and 13). When flag is true, the edge will be added into (Lines 14 and 15). If is not in , it will add to and update for each neighbor of and (Lines 16–18).
After the VES is found, Algorithm 6 is invoked to find all vertices whose core numbers will change after insertion. Similar to Algorithm 2, we first do the initialization (Lines 1–6). For each edge to be inserted, it computes the pre-core of and then adds the endpoint with the smaller core number to (Lines 7–12). When a vertex is ejected from , the algorithm inserts it into and sets as true (Lines 14–16). For each vertex in , each of its neighbors is checked to see if ’s core number is likely to increase. The calculation of SN value is the same as the case of single-edge insertion (Lines 17–22). If reaches the threshold, i.e., is greater than when in or is not less than when in , then each neighbor of which satisfies = and is not visited is pushed into for further checking (Lines 23 and 24). Otherwise, the procedure DfsDelete is invoked to remove from , since it is impossible for to increase (Line 25).
Performance analysis. To analyze the time complexity of the algorithm, we first provide some useful notations. Let be the maximum number of edges inserted for each vertex in . By the definition of VES, it is easy to see that the inserted edges can be divided into -independent edge sets. When a VES is inserted into graph , the maximum degree of all vertices in is denoted by and . For each , let and be the set of vertices whose core numbers are equal to in and , respectively, where and . Let and . Let be the set of edges in the subgraph induced by to , and let .
Let = , where denotes the SN value of when is processed for the first time. Let = . Similarly, we define = and = .
The following theorem gives the time complexity of Algorithm 4 and proves its correctness.
Theorem 4 Algorithm 4 can correctly update the core numbers of all vertices after inserting an edge set in time.
Proof Algorithm 4 is executed in iterations and each iteration includes two parts. The first part finds a -independent edge set from all unprocessed edges in by executing Algorithm 5. According to Lemma 3, each vertex can change its core number by 1 after inserting a -independent edge set into the graph.
As for the second part, we invoke Algorithm 6 to identify the vertices whose core number can increase after insertion for each . It starts from the inserted edges and adds each endpoint with the smaller core number to for further identification. Similar to Algorithm 2, the algorithm uses to record the number of neighbors that support ’s core number increase. If , it is obvious that ’s core number will not increase. Once a vertex is identified not to change its core number, the SN values of other vertices are updated by invoking DfsDelete. After all the vertices whose core numbers might change have been processed, the vertices in that are not deleted will increase their core numbers, as these vertices have enough neighbors to support them in an -core with a larger value. When all edges in are handled, the potential vertices are visited and the ones that cannot increase core numbers are removed. All of the above ensure the correctness of Algorithm 4.
As for the time complexity, it is similar to the case of single-edge insertion. The difference is that Algorithm 4 has to deal with multiple edges in batches. For each batch, the algorithm needs to iterate at most times. Based on Lemma 2, when inserting an edge (assuming ), only the vertices with = are reachable from through a -path, and can change their core numbers. Therefore, when inserting a VES into the graph, the number of vertices and edges visited by Algorithm 4 is at most and , respectively. For each vertex, except for the first SN value calculation when ejected from , all the subsequent visits will decrease its SN value by 1. So for the vertex , it can be visited by at most times, since it will be removed from when its SN is smaller than . Then we can know that the vertex will be visited at most () times in any iteration. Therefore, it can be concluded that the time complexity of the Algorithm 4 is the same as stated in Theorem 4.
Discussion. A point we would like to emphasize is that the batch processing algorithm can greatly reduce redundant computations. This is because when processing the insertion of a , once a vertex is determined to increase its core number, it is unnecessary to visit this vertex again, as it will not increase the core number anymore. We illustrate this observation with the graph in
Fig. 1
. Assuming that is the edge set that needs to deal with. If we insert these edges one by one using Algorithm 2, then we need to visit vertex three times, vertex and two times, and the rest vertices once. However, since the satisfies the conditions of -independent edge set, so we can process these edges using one iteration and visit all vertices in and only once. Thus, the time consumed is significantly reduced.
Decremental core maintenance
Algorithm. The detailed algorithm to maintain each vertex’s core number when multiple edges are deleted is given in Algorithm 7. It is executed until all edges in have been processed (Line 1). In each iteration, the algorithm calls the subroutine FindDeleteEdges to find a -independent edge set of (Line 2). Then for each , the DeleteChangedSet algorithm is invoked to find the vertices whose core numbers will decrease by 1 after deleting and set their core numbers (Lines 5– 9). Specifically, for those vertices in connected to , their current core numbers are compared with pre-core, because deleting may cause their core numbers to change by more than 1 (Lines 10–12).
Algorithm 8 finds a VES from the unprocessed edges in . The algorithm is similar to Algorithm 5 except that for each edge , we need not to deal with , because is no longer a neighbor of after deletion. When finding the vertices whose core number changes in Algorithm 9, the difference is that it compares with instead of computing pre- for each edge (Lines 3–6). When dealing with each vertex in , the process of computing value is similar to Algorithm 3 (Lines 11–17). Once the is not enough to keep the current core number of , the procedure DfsUpdate is invoked to record that is going to decrease (Lines 18 and 19).
Performance analysis. Firstly, we will give some notations that are similar to the case of inserting multiple edges and then the correctness and time complexity of Algorithm 7 can be easily obtained.
Let be the maximum number of edges deleted from each vertex in . Similarly, we can get that the number of s is . When a is deleted from graph , the maximum degree of all vertices in is denoted by and . For each , let and be the set of vertices whose core numbers equal to in and , respectively, where and . Let and . Let be the set of edges in the subgraph induced by to and let .
Let , where denotes the SN value of when is processed for the first time. Let . Similarly, we define and = .
Theorem 5 Algorithm 7 can correctly update the core numbers of all vertices after deleting an edge set in time.
Decremental core maintenance
Algorithm. The detailed algorithm to maintain each vertex’s core number when multiple edges are deleted is given in Algorithm 7. It is executed until all edges in have been processed (Line 1). In each iteration, the algorithm calls the subroutine FindDeleteEdges to find a -independent edge set of (Line 2). Then for each , the DeleteChangedSet algorithm is invoked to find the vertices whose core numbers will decrease by 1 after deleting and set their core numbers (Lines 5– 9). Specifically, for those vertices in connected to , their current core numbers are compared with pre-core, because deleting may cause their core numbers to change by more than 1 (Lines 10–12).
Algorithm 8 finds a VES from the unprocessed edges in . The algorithm is similar to Algorithm 5 except that for each edge , we need not to deal with , because is no longer a neighbor of after deletion. When finding the vertices whose core number changes in Algorithm 9, the difference is that it compares with instead of computing pre- for each edge (Lines 3–6). When dealing with each vertex in , the process of computing value is similar to Algorithm 3 (Lines 11–17). Once the is not enough to keep the current core number of , the procedure DfsUpdate is invoked to record that is going to decrease (Lines 18 and 19).
Performance analysis. Firstly, we will give some notations that are similar to the case of inserting multiple edges and then the correctness and time complexity of Algorithm 7 can be easily obtained.
Let be the maximum number of edges deleted from each vertex in . Similarly, we can get that the number of s is . When a is deleted from graph , the maximum degree of all vertices in is denoted by and . For each , let and be the set of vertices whose core numbers equal to in and , respectively, where and . Let and . Let be the set of edges in the subgraph induced by to and let .
Let , where denotes the SN value of when is processed for the first time. Let . Similarly, we define and = .
Theorem 5 Algorithm 7 can correctly update the core numbers of all vertices after deleting an edge set in time.
Decremental core maintenance
Algorithm. The detailed algorithm to maintain each vertex’s core number when multiple edges are deleted is given in Algorithm 7. It is executed until all edges in have been processed (Line 1). In each iteration, the algorithm calls the subroutine FindDeleteEdges to find a -independent edge set of (Line 2). Then for each , the DeleteChangedSet algorithm is invoked to find the vertices whose core numbers will decrease by 1 after deleting and set their core numbers (Lines 5– 9). Specifically, for those vertices in connected to , their current core numbers are compared with pre-core, because deleting may cause their core numbers to change by more than 1 (Lines 10–12).
Algorithm 8 finds a VES from the unprocessed edges in . The algorithm is similar to Algorithm 5 except that for each edge , we need not to deal with , because is no longer a neighbor of after deletion. When finding the vertices whose core number changes in Algorithm 9, the difference is that it compares with instead of computing pre- for each edge (Lines 3–6). When dealing with each vertex in , the process of computing value is similar to Algorithm 3 (Lines 11–17). Once the is not enough to keep the current core number of , the procedure DfsUpdate is invoked to record that is going to decrease (Lines 18 and 19).
Performance analysis. Firstly, we will give some notations that are similar to the case of inserting multiple edges and then the correctness and time complexity of Algorithm 7 can be easily obtained.
Let be the maximum number of edges deleted from each vertex in . Similarly, we can get that the number of s is . When a is deleted from graph , the maximum degree of all vertices in is denoted by and . For each , let and be the set of vertices whose core numbers equal to in and , respectively, where and . Let and . Let be the set of edges in the subgraph induced by to and let .
Let , where denotes the SN value of when is processed for the first time. Let . Similarly, we define and = .
Theorem 5 Algorithm 7 can correctly update the core numbers of all vertices after deleting an edge set in time.
10.26599/TST.2021.9010091.F003
Influence of the number of deleting/inserting edges on single-edge core maintenance algorithms.
10.26599/TST.2021.9010091.F004
Influence of the number of deleting/inserting edges on multiple-edge core maintenance algorithms.
10.26599/TST.2021.9010091.F005
Comparison of the efficiency of two core maintenance algorithms and the core decomposition algorithm.
10.26599/TST.2021.9010091.F006
Impact of graph size; ) and refer to the multiple-edge (single-edge) core maintenance algorithms.