Greedy Optimization for K-Means-Based Consensus Clustering

Xue Li; Hongfu Liu

doi:10.26599/TST.2018.9010063

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (613.1 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Greedy Optimization for K-Means-Based Consensus Clustering

Xue Li(

), Hongfu Liu

School of Economics and Management, Tsinghua University, Beijing 100084, China.

Department of Electrical and Computer Engineering, Northeastern University, Boston MI 02115, USA.

Show Author Information

Abstract

Consensus clustering aims to fuse several existing basic partitions into an integrated one; this has been widely recognized as a promising tool for multi-source and heterogeneous data clustering. Owing to robust and high-quality performance over traditional clustering methods, consensus clustering attracts much attention, and much efforts have been devoted to develop this field. In the literature, the K-means-based Consensus Clustering (KCC) transforms the consensus clustering problem into a classical K-means clustering with theoretical supports and shows the advantages over the state-of-the-art methods. Although KCC inherits the merits from K-means, it suffers from the initialization sensitivity. Moreover, the current consensus clustering framework separates the basic partition generation and fusion into two disconnected parts. To solve the above two challenges, a novel clustering algorithm, named Greedy optimization of K-means-based Consensus Clustering (GKCC) is proposed. Inspired by the well-known greedy K-means that aims to solve the sensitivity of K-means initialization, GKCC seamlessly combines greedy K-means and KCC together, achieves the merits inherited by GKCC and overcomes the drawbacks of the precursors. Moreover, a 59-sampling strategy is conducted to provide high-quality basic partitions and accelerate the algorithmic speed. Extensive experiments on 36 benchmark datasets demonstrate the significant advantages of GKCC over KCC and KCC++ in terms of the objective function values and standard deviations and external cluster validity.

Keywords

K-means consensus clustering initialization greedy optimization

References

[1]

Jain A. K., Data clustering: 50 years beyond K-means, Patt. Recognit. Lett., vol. 31, no. 8, pp. 651–666, 2010.

Crossref Google Scholar

[2]

MacQueen J., Some methods for classification and analysis of multivariate observations, in Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, Berkeley, CA, USA, 1967.

[3]

Tan P. N., Steinbach M., and Kumar V., Introduction to Data Mining. India: Pearson Education, 2006.

[4]

Ester M., Kriegel H. P., Sander J., and Xu X. W., A density-based algorithm for discovering clusters in large spatial databases with noise, in Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining, München, Germany, 1996.

[5]

Strehl A. and Ghosh J., Cluster ensembles— A knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res., vol. 3, pp. 583–617, 2003.

Google Scholar

[6]

Wu J. J., Liu H. F., Xiong H., Cao J., and Chen J., K-means-based consensus clustering: A unified view, IEEE Trans. Knowl. Data Eng., vol. 27, no. 1, pp. 155–169, 2015.

Crossref Google Scholar

[7]

Liu H. F., Wu J. J., Tao D. C., Zhang Y. C., and Fu Y., Dias: A disassemble-assemble framework for highly sparse text clustering, in Proc. 2015 SIAM Int. Conf. Data Mining, 2015.

[8]

Fred A. L. N. and Jain A. K., Combining multiple clusterings using evidence accumulation, IEEE Trans. Patt. Anal. Mach. Intell., vol. 27, no. 6, pp. 835–850, 2005.

Crossref Google Scholar

[9]

Wu J. J., Liu H. F., Xiong H., and Cao J., A theoretic framework of K-means-based consensus clustering, in Proc. 23rd Int. Joint Conf. Artificial Intelligence, Beijing, China, 2013.

[10]

Liu H. F., Liu T. L., Wu J. J., Tao D. C., and Fu Y., Spectral ensemble clustering, in Proc. 21th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 2015.

[11]

Likas A., Vlassis N., and Verbeek J. J., The global K-means clustering algorithm, Patt. Recognit., vol. 36, no. 2, pp. 451–461, 2003.

Crossref Google Scholar

[12]

Topchy A., Jain A. K., and Punch W., A mixture model for clustering ensembles, in Proc. 4th SIAM Int. Conf. Data Mining, Lake Buena Vista, FL, USA, 2004.

[13]

Li T., Ding C., and Jordan M. I., Solving consensus and Semi-supervised clustering problems using nonnegative matrix factorization. in Proc. 7th IEEE Int. Conf. Data Mining, Omaha, NE, USA, 2007.

[14]

Vega-Pons S., Correa-Morris J., and Ruiz-Shulcloper J., Weighted partition consensus via kernels, Patt. Recognit., vol. 43, no. 8, pp. 2712–2724, 2010.

Crossref Google Scholar

[15]

Lu Z. W., Peng Y. X., and Xiao J. G., From comparing clusterings to combining clusterings, in Proc. 23rd National Conf. Artificial Intelligence, Chicago, IL, USA, 2008, pp. 665–670.

[16]

Topchy A., Jain A. K., and Punch W., Combining multiple weak clusterings, in Proc. 3rd IEEE Int. Conf. Data Mining, Melbourne, FL, USA, 2003.

[17]

Gionis A., Mannila H., and Tsaparas P., Clustering aggregation, ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, p. 4, 2007.

Crossref Google Scholar

[18]

Liu H. F., Wu J. J., Liu T. L., Tao D. C., and Fu Y., Spectral ensemble clustering via weighted K-means: Theoretical and practical evidence, IEEE Trans. Knowl. Data Eng., vol. 29, no. 5, pp. 1129–1143, 2017.

Crossref Google Scholar

[19]

Liu H. F., Zhao R., Fang H. S., Cheng F. X., Fu Y., and Liu Y. Y., Entropy-based consensus clustering for patient stratification, Bioinformatics, vol. 33, no. 17, pp. 2691–2698, 2017.

Crossref Google Scholar

[20]

Liu H. F., Shao M., Li S., and Fu Y., Infinite ensemble for image clustering, in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016.

[21]

Liu H. F., Shao M., Li S., and Fu Y., Infinite ensemble clustering, Data Min. Knowl. Discov., vol. 32, no. 2, pp. 385–416, 2018.

Crossref Google Scholar

[22]

Fern X. L. Z. and Brodley C. E., Random projection for high dimensional data clustering: A cluster ensemble approach, in Proc. 20th Int. Conf. Machine Learning, Washington, DC, USA, 2013.

[23]

Ayad H. G. and Kamel M. S., Cumulative voting consensus method for partitions with variable number of clusters, IEEE Trans. Patt. Anal. Mach. Intell., vol. 30, no. 1, pp. 160–173, 2008.

Crossref Google Scholar

[24]

Domeniconi C. and Al-Razgan M., Weighted cluster ensembles: Methods and analysis, ACM Trans. Knowl. Discov. Data, vol. 2, no. 4, p. 17, 2009.

Crossref Google Scholar

[25]

Yoon H. S., Ahn S. Y., Lee S. H., Cho S. B., and Kim J., Heterogeneous clustering ensemble method for combining different cluster results, in Proc. 2006 Int. Conf. Data Mining for Biomedical Applications, 2006, pp. 82–92.

Crossref

[26]

Vega-Pons S. and Ruiz-Shulcloper J., A survey of clustering ensemble algorithms, Int. J. Patt. Recogn. Artif. Intell., vol. 25, no. 3, pp. 337–372, 2011.

Crossref Google Scholar

[27]

Arthur D. and Vassilvitskii S., K-means++: The advantages of careful seeding. in Proc. 18th Annual ACM-SIAM Symp. Discrete Algorithms, New Orleans, LA, USA, 2007.

[28]

Bahmani B., Moseley B., Vattani A., Kumar R., and Vassilvitskii S., Scalable K-means++, Proc. VLDB Endow., vol. 5, no. 7, pp. 622–633, 2012.

Crossref Google Scholar

[29]

Kettani O., Tadili B., and Ramdani F., A deterministic K-means algorithm based on nearest neighbor search, Int. J. Comput. Appl., vol. 63, no. 15, pp. 33–37, 2013.

Crossref Google Scholar

[30]

Zhu Y., Yu J., and Jia C. Y., Initializing K-means clustering using affinity propagation, in Proc. 9th IEEE Int. Conf. Hybrid Intelligent Systems, Shenyang, China, 2009.

[31]

Capó M., Pérez A., and Lozano J. A., A recursive K-means initialization algorithm for massive data, in Proc. Spanish Association for Artificial Intelligence, 2015.

[32]

Mirkin B., Reinterpreting the category utility function, Mach. Learn., vol. 45, no. 2, pp. 219–228, 2001.

Crossref Google Scholar

[33]

C. Boutsidis, E. Liberty, and M. Sviridenko, Greedy minimization of weakly supermodular set functions, arXiv preprint arXiv: 1502.06528, 2015.

[34]

Wu J. J., Xiong H., and Chen J., Adapting the right measures for K-means clustering, in Proc. 15th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Paris, France, 2009.

Tsinghua Science and Technology

Volume 23 Issue 2,
April 2018

Pages 184-194

DOI: 10.26599/TST.2018.9010063

Cite this article:

Li X, Liu H. Greedy Optimization for K-Means-Based Consensus Clustering. Tsinghua Science and Technology, 2018, 23(2): 184-194. https://doi.org/10.26599/TST.2018.9010063

624

Views

Downloads

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 06 February 2017

Revised: 25 April 2017

Accepted: 02 May 2017

Published: 02 April 2018