Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Post and Telecommunication, Chongqing400060, China.
Show Author Information
Hide Author Information
Abstract
There are many algorithms for solving complex problems in supervised manner. However, unsupervised tasks are more common in real scenarios. Inspired by the idea of granular computing and the characteristics of human cognitive process, this paper proposes a complex tasks decomposition mechanism based on Density Peaks Clustering (DPC) to address complex tasks with an unsupervised process, which simulates the multi-granular observation and analysis of human being. Firstly, the DPC algorithm is modified to nullify its essential defects such as the difficulty of locating correct clustering centers and classifying them accurately. Then, the improved DPC algorithm is used to construct the initial decomposition solving space with multi-granularity theory. We also define subtask centers set and the granulation rules to guide the multi-granularity decomposing procedure. These rules are further used to decompose the solving space from coarse granules to the optimal fine granules with a convergent and automated process. Furthermore, comprehensive experiments are presented to verify the applicability and veracity of our proposed method in community-detection tasks with several benchmark complex social networks. The results show that our method outperforms other four state-of-the-art approaches.
L. Y.Zhu, G. W.Yuan, and Q.Du, An efficient explicit/implicit domain decomposition method for convection-diffusion equations, Numer. Methods Partial Differ. Equ., vol. 26, no. 4, pp. 852-873, 2010.
R. E.Jenkins and B. P.Yuhas, A simplified neural network solution through problem decomposition: The case of the truck backer-upper, IEEE Trans. Neural Netw., vol. 4, no. 4, pp. 718-720, 1993.
S.Thiria, C.Mejia, F.Badran, and M.Crépon, Multimodular architecture for remote sensing operations, in Proc. 4th Int. Conf. Neural Information Processing Systems, Denver, CO, USA, 1991, pp. 675-682.
[8]
R.Rifkin and A.Klautau, In defense of one-vs-all classification, J. Mach. Learn. Res., vol. 5, pp. 101-141, 2004.
R.Perdisci, G. F.Gu, and W.Lee, Using an ensemble of one-class SVM classifiers to harden payload-based anomaly detection systems, in Proc. 6th Int. Conf. Data Mining, Hong Kong, China, 2006, pp. 488-498.
L. C.Cobo, C. L.IsbellJr., and A. L.Thomaz, Automatic task decomposition and state abstraction from demonstration, in Proc. 11th Int. Conf. Autonomous Agents and Multiagent Systems, Valencia, Spain, 2012, pp. 483-490.
[11]
J.Xu and G. Y.Wang, Leading tree in DPCLUS and its impact on building hierarchies, arXiv preprint arXiv: 1506.03879, 2015.
K.Sun, X. R.Geng, and L. Y.Ji, Exemplar component analysis: A fast band selection method for hyperspectral imagery, IEEE Geosci. Remote Sens. Lett., vol. 12, no. 5, pp. 998-1002, 2015.
Y. W.Chen, D. H.Lai, H.Qi, J. L.Wang, and J. X.Du, A new method to estimate ages of facial image for large database, Multimed. Tools Appl., vol. 75, no. 5, pp. 2877-2895, 2016.
K. M.Dean, L. M.Davis, J. L.Lubbeck, P.Manna, P.Friis, A. E.Palmer, and R.Jimenez, High-speed multiparameter photophysical analyses of fluorophore libraries, Anal. Chem., vol. 87, no. 10, pp. 5026-5030, 2015.
A. P.Zeng and Y. P.Huang, A text classification algorithm based on Rocchio and hierarchical clustering, in Proc. 7th Int. Conf. Advanced Intelligent Computing, Zhengzhou, China, 2011, pp. 432-439.
M. M.Wang, W. L.Zuo, and Y.Wang, An improved density peaks-based clustering method for social circle discovery in social networks, Neurocomputing, vol. 179, pp. 219-227, 2016.
W.He and M. D.Xing, Classification method for POLSAR images based on find of density peak, (in Chinese), Syst. Eng. Electron., vol. 38, no. 1, pp. 60-63, 2016.
G. Y.Wang, J.Xu, Q. H.Zhang, and Y.C Liu, Multi-granularity intelligent information processing, in Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Y. Y.Yao, Q. H.Hu, H.Yu, and J. W.Grzymala-Busse, eds. Springer, 2015.
[22]
S. L.Wang, D. K.Wang, C. Y.Li, Y.Li, and G. Y.Ding, Clustering by fast search and find of density peaks with data field, Chin. J. Electron., vol. 25, no. 3, pp. 397-402, 2016.
M. L.Du, S. F.Ding, and H. J.Jia, Study on density peaks clustering based on k-nearest neighbors and principal component analysis, Knowl.-Based Syst., vol. 99, pp. 135-145, 2016.
X. H.Xu, Y. S.Ju, Y. L.Liang, and P.He, Manifold density peaks clustering algorithm, in Proc. 3rd Int. Conf. Advanced Cloud and Big Data, Yangzhou, China, 2015, pp. 311-318.
J.Fagnan, O.Zaïane, and D.Barbosa, Using triads to identify local community structure in social networks, in Proc. 2014 IEEE/ACM Int. Conf. Advances in Social Networks Analysis and Mining, Beijing, China, 2014, pp. 108-112.
K. H.Lim and A.Datta, A seed-centric community detection algorithm based on an expanding ring search, in Proc. 1st Australasian Web Conf., Adelaide, South Australia, 2013.
[31]
Z.Yakoubi and R.Kanawati, LICOD: A leader-driven algorithm for community detection in complex networks, Vietnam J. Comput. Sci., vol. 1, no. 4, pp. 241-256, 2014.
D.Lusseau, K.Schneider, O. J.Boisseau, P.Haase, E.Slooten, and S. M.Dawson, The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations, Behav. Ecol. Sociobiol., vol. 54, no. 4, pp. 396-405, 2003.
N. X.Vinh, J.Epps, and J.Bailey, Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance, J. Mach. Learn. Res., vol. 11, pp. 2837-2854, 2010.
X. W.Xu, N.Yuruk, Z. D.Feng, and T. A. J.Schweiger, SCAN: A structural clustering algorithm for networks, in Proc. 13th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Jose, CA, USA, 2007, pp. 824-833.
Pang Z, Wang G, Yang J. A Multi-granularity Decomposition Mechanism of Complex Tasks Based on Density Peaks. Big Data Mining and Analytics, 2018, 1(3): 245-256. https://doi.org/10.26599/BDMA.2018.9020023
10.26599/BDMA.2018.9020023.F001
(a) Dataset DS which contains the latitude and longitude of 13 cities; (b) Leading tree of DS; and (c) Clustering result of DS.
10.26599/BDMA.2018.9020023.F002
The real social structure of Dolphin network.
10.26599/BDMA.2018.9020023.F003
The global task leading tree of Dolphin network.
3.2 Multi-granularity task solving space
In this stage, to find the optimal task solving space, we will refine the initial solving space of coarsest grain by the theory of multi-granularity. If we solve a problem in a fine grain layer, we can gain more efficiency and consume less time. Based on first stage, we utilize similarity of subtasks to partition the initial coarse grain layer into several fine grain layers, and then, estimate the optimal task solving layer through the rules of optimizing of solving layers. Additionally, the steps of our method, the MrGDM (Multi-granularity decomposition mechanism of complex tasks based on density peaks), are described in Algorithm 3.2. Our proposed method contains three aspects which are arranged in the following sequence.
3.2.1 Initial subtask centers set
Firstly, for initial subtask centers set, it consists of the latent core nodes which can organize other non-core nodes to be a small subtask entirety. Besides, what calls for special attention is that the initial subtask centers set is not the final subtask centers set. For DPC algorithm, in the decision graph, there is some remarkable information for choosing the subtask centers set. For nodes set , calculating the value and descending , which is shown in
Table 3
. The nodes more closer to would more likely to be the centers, because they possess the criterion of higher local density and relatively far distance according to DPC algorithm. Considering the problem of omitting any center, we select advisable and redundant center nodes from , as the initial subtask centers set .
10.26599/BDMA.2018.9020023.T003
The redundant centersCT of Dolphin network.
i
1
2.21
0.75
48
46
2
2.42
0.79
55
15
3
1.08
0.81
11
14
4
0.92
0.79
60
38
34
31
1.76
0.73
48
52
32
0.20
1
18
62
0.61
1
38
61
3.2.2 Similarity measure of subtasks
After choosing advisable and redundant center tasks, the center tasks with their oriented subtask sets can be clearly observed from the global tasks leading tree. The oriented subtask nodes set can be obtained by traversing the nodes of subtask centers set in the global tasks leading tree.
We assume that the subtask nodes set can split from the global task leading tree. Computing the similarity between and , if their similarity is less than the user-specified threshold which is set according to the actual situation, it means is not close or interconnected with the initial tasks . And it illustrates that satisfies the condition to be an autocephalous subtask set. Therefore, by splitting the global tasks leading tree, several small and isolated subtrees would be generated. On the basis of splitting process, the solving space of the coarse granular layer will be segregated and form a solving space with fine granular layer which is composed by a series of polybasic and independent subtask sets. If the small subtask entireties have been processed altogether, the original complex task would be answered by synthesizing the answers of subtask entireties.
There are many algorithms to measure the similarity between objects, such as Euclidean metric, Cosine, and Jaccard in vector space model. We select the similarity measure method in Ref. [
27
], which is based on graph partitioning theory and fully considers relative interconnectivity and closeness, thereby manifesting great results in clustering.
Thus, the relative interconnectivity between and is
where denotes the absolute interconnectivity between and and represents the total weights of the edges that straddle the nodes in and . and represent the sum weight of the edges in and , respectively.
The relative closeness between and is
and . where is the normalization of . is the average weight of the edges that connect nodes between and . and denote the average weights of the edges that pertain to the min-cut bisector of and , respectively. Terms and are the numbers of nodes in and , respectively.
The similarity of and is measured by
where the user-specified parameter is to control the relative importance between and . If , it means that relative interconnectivity and closeness have the same importance.
3.2.3 Optimizing grain layer of solving spaces
By calculating the similarities of task subsets generated by , we set a user-specific threshold to control the progress of separating the initial coarse granular layer into fine granular layers. Here, we formalize some notions and rules for optimizing grain layer of solving spaces.
Definition 1 (Final tasks center set) Given a task , a redundant center task set denoted by , is the subtask set of . For each node of to calculate the , select the nodes where , and then, pop up the node which occupies maximum to belong to the final tasks center set . When there is no node in that satisfies the condition , the process of popping up center node is to terminate.
Definition 2 (Granulating rule) Within Definition 1, for each granulating operation, if there is a node popped up, granulate from on top coarse grain layer to under fine grain layer; granulation is to be terminated when center popping terminates.
In Definition 1, the paramount idea is that for each granulation the most likely center of subtask will be popped up, as conforms to the ideas of priority of majority. For every processing of popping, it is the processing of task decomposition; thus, we can obtain an accurate center of subtask. Via the method in Definition 2, we get a precise fine grain layer of solving space, since the terminal optimal decomposition of task has been captured. Our method can pop up the potential subtask centers and automatically generate the optimal fine granularity task solving space where we can decompose the complex task into undemanding subtask sets.
In
Table 3
, descends from . As we mentioned in Algorithm 3.2, we select the first to sixth nodes in as advisable and redundant centers, so . According to Definition 1 and Definition 2, we explore the final optimal task solving space of Dolphin social network. We set , because if the Similarity between two communities is less than , it means that they are less connected to each other, so we consider that segregating the two communities is meaningful and ponderable.
According to Definition 1, for the first round shown in
Table 4
, node is the most likely center of a community. Node and its child nodes should be split from the global leading tree, and then, the popped node should be removed from redundant to . Next, consulting with Definition 2, we can granulate the initial coarse granular layer to a particular fine granular layer, which is shown in
Figs. 4a and 4b
. After the first popping, the terminal condition has not been triggered, so we continue executing the popping process until the terminal condition takes effect.
10.26599/BDMA.2018.9020023.T004
Popping process for the first member of final center task set.
Similarity ()
46
—
15
0.436
14
0.129
38
0.291
34
0.290
52
0.290
10.26599/BDMA.2018.9020023.F004
(a) Global leading tree structure of Dolphin network; (b) Progress of the first granulation, where node 14 is one of the community center, so the tree splits two sub trees, and they are constituted as a fine granularity task solving layer; (c) Granulating process triggered by center node 52, and they form a fine granular solving layer with three subtask sets.
In the second process shown in
Table 5
, we can see that the node is popped up as the second latent community center, and an operation like first popping process is repeated until the terminal condition occurs. The granulating progress is presented in
Figs. 4b and 4c
.
10.26599/BDMA.2018.9020023.T005
Popping process for the second member of final center task set.
Similarity ()
46
—
15
0.559
38
0.500
34
0.529
52
0.317
In the third process shown in
Table 6
, the terminal condition has been triggered automatically, because there is no node whose similarity is less than threshold.
10.26599/BDMA.2018.9020023.T006
Popping process for the third member of final center task set.
Similarity ()
46
—
15
0.559
38
0.500
34
0.529
In the Dolphin network, we select the second solving space to reveal the communities detected by our method. Compared to the real structure of Dolphin network, the results obtained by method MrGDM can ultimately conform to real communities, this proves that the MrGDM is practicable and meaningful, and can provide decent validity simultaneously.
Figure 5
shows MrGDM results.
10.26599/BDMA.2018.9020023.F005
The structure of Dolphin network by MrGDM.
10.26599/BDMA.2018.9020023.F004
(a) Global leading tree structure of Dolphin network; (b) Progress of the first granulation, where node 14 is one of the community center, so the tree splits two sub trees, and they are constituted as a fine granularity task solving layer; (c) Granulating process triggered by center node 52, and they form a fine granular solving layer with three subtask sets.
10.26599/BDMA.2018.9020023.F005
The structure of Dolphin network by MrGDM.
10.26599/BDMA.2018.9020023.T001The intermediate results of DS.
1
12
1
2
13
1
3
12
1
4
12
1
5
6
2
6
13
2
7
8
2
8
6
2
9
11
3
10
11
3
11
12
3
12
13
1
13
0
1
10.26599/BDMA.2018.9020023.T002The intermediate results of DPC in Dolphin network.
Nneigh
1
2.21
0.75
48
2
2.42
0.79
55
3
1.08
0.81
11
4
0.92
0.79
60
31
1.76
0.73
48
32
0.20
1.00
18
62
0.60
11.00
38
10.26599/BDMA.2018.9020023.T003The redundant centersCT of Dolphin network.
i
1
2.21
0.75
48
46
2
2.42
0.79
55
15
3
1.08
0.81
11
14
4
0.92
0.79
60
38
34
31
1.76
0.73
48
52
32
0.20
1
18
62
0.61
1
38
61
10.26599/BDMA.2018.9020023.T004Popping process for the first member of final center task set.
Similarity ()
46
—
15
0.436
14
0.129
38
0.291
34
0.290
52
0.290
10.26599/BDMA.2018.9020023.T005Popping process for the second member of final center task set.
Similarity ()
46
—
15
0.559
38
0.500
34
0.529
52
0.317
10.26599/BDMA.2018.9020023.T006Popping process for the third member of final center task set.
Similarity ()
46
—
15
0.559
38
0.500
34
0.529
10.26599/BDMA.2018.9020023.T007Compared methods and their description.
Method
Description
ENBC[28]
By utilizing personal views and redefined community structure, this algorithm presents a notion of common interest in the relationship of social network. The method proposes two key measures reachability and isolability, where the reachability evaluates the ability of each node to reach out members of community and the isolability accounts for the ability of any community to isolate itself from the rest of the network.
Local-T[29]
By considering the number of internal and external triads, the main idea of this method is the T metric, which computes the relative quality of a community. The authors propose an intuitive statistical method based on the T metric, which can identify outlier and hub nodes within each discovered community.
CDERS[30]
The authors utilize an expanding ring search starting from the individual of interest and treat it as the seed node, and then according to their definition of a community, they iteratively contain the nodes at increasing number of hops from the seed user. If there is no further nodes can be appended, the iterative process is terminated. Furthermore, the social communities are organized by the list of added nodes.
LICOD[31]
This method adopts a leader-follower approach, whereby the leader nodes create social communities in which local communities can be calculated. The nodes whose degree is higher percentage compared to their neighbors would be selected as leader nodes. And then the leaders with a certain percent of common neighbors are considered a community. By computing the shortest distance of every node to the leader and considering the decision of neighbors, each node can be added to advisable community.
10.26599/BDMA.2018.9020023.T008Experimental applications of the compared methods for the selected datasets.
Dataset
Method
NMI
F-Score
Modularity (rank)
C
Karate
0.358
2
ENBC
0.837
0.939
0.358
2
Local-T
0.837
0.939
0.372
2
LICOD
0.677
0.882
0.372
2
CDERS
0.649
0.832
0.312
2
MrGDM
1(1)
1(1)
0.371 (2)
2
Dolphin
0.360
2
ENBC
0.530
0.671
0.491
3
Local-T
0.434
0.605
0.510
4
LICOD
0.442
0.471
0.494
7
CDERS
0.627
0.890
0.374
2
MrGDM
0.889 (1)
0.970 (1)
0.379 (2)
2
Football
0.551
12
ENBC
0.906
0.810
0.592
10
Local-T
0.864
0.722
0.545
10
LICOD
0.900
0.850
0.603
11
CDERS
0.762
0.644
0.634
12
MrGDM
0.890 (3)
0.829 (2)
0.555 (1)
12
LFR1
0.784
15
ENBC
0.992
0.992
0.781
15
Local-T
0.992
0.992
0.781
15
LICOD
0.965
0.922
0.777
13
CDERS
0.922
0.867
0.634
15
MrGDM
0.970 (3)
1 (1)
0.857 (4)
15
LFR2
0.859
30
ENBC
0.998
0.997
0.858
30
Local-T
0.777
0.504
0.706
19
LICOD
0.997
0.995
0.853
31
CDERS
0.942
0.913
0.780
30
MrGDM
0.959 (3)
0.957 (3)
0.837 (3)
31
10.26599/BDMA.2018.9020023.T009Summary of complexity of community detection algorithms.