AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (510.9 KB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Nonnegative Matrix Tri-Factorization Based Clustering in a Heterogeneous Information Network with Star Network Schema

College of Computer Science and Technology, Jilin University, Changchun 130012, China
School of Intelligent Systems Science and Engineering, Jinan University, Zhuhai 519070, China
Show Author Information

Abstract

Heterogeneous Information Networks (HINs) contain multiple types of nodes and edges; therefore, they can preserve the semantic information and structure information. Cluster analysis using an HIN has obvious advantages over a transformation into a homogenous information network, which can promote the clustering results of different types of nodes. In our study, we applied a Nonnegative Matrix Tri-Factorization (NMTF) in a cluster analysis of multiple metapaths in HIN. Unlike the parameter estimation method of the probability distribution in previous studies, NMTF can obtain several dependent latent variables simultaneously, and each latent variable in NMTF is associated with the cluster of the corresponding node in the HIN. The method is suited to co-clustering leveraging multiple metapaths in HIN, because NMTF is employed for multiple nonnegative matrix factorizations simultaneously in our study. Experimental results on the real dataset show that the validity and correctness of our method, and the clustering result are better than that of the existing similar clustering algorithm.

References

[1]
F. Wang, L. Hu, J. Zhou, and K. Zhao, A survey from the perspective of evolutionary process in the internet of things, Int. J. Distrib. Sens. Netw., vol. 2015, p. 462752, 2015.
[2]
C. Shi, Y. T. Li, J. W. Zhang, Y. Z. Sun, and P. S. Yu, A survey of heterogeneous information network analysis, IEEE Trans. Knowl. Data Eng., vol. 29, no. 1, pp. 17-37, 2017.
[3]
Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798-1828, 2013.
[4]
K. Yang, J. H. Zhu, and X. Guo, POI neural-rec model via graph embedding representation, Tsinghua Science and Technology, vol. 26, no. 2, pp. 208-218, 2021.
[5]
M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich, A review of relational machine learning for knowledge graphs, Proc. IEEE, vol. 104, no. 1, pp. 11-33, 2016.
[6]
Y. Z. Sun, J. W. Han, P. X. Zhao, Z. J. Yin, H. Cheng, and T. Y. Wu, RankClus: Integrating clustering with ranking for heterogeneous information network analysis, in Proc. 12th Int. Conf. Extending Database Technology: Advances in Database Technology, Saint Petersburg, Russia, 2009, pp. 565-576.
[7]
Y. Z. Sun, Y. T. Yu, and J. W. Han, Ranking-based clustering of heterogeneous information networks with star network schema, in Proc. 15th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Paris, France, 2009, pp. 797-806.
[8]
Y. Z. Sun, B. Norick, J. W. Han, X. F. Yan, P. S. Yu, and X. Yu, PathSelClus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks, ACM Trans. Knowl. Discov. Data, vol. 7, no. 3, pp. 11, 2013.
[9]
L. Hu, G. Wu, Y. H. Xing, and F. Wang, Things2Vec: Semantic modeling in the internet of things with graph representation learning, IEEE Internet Things J., vol. 7, no. 3, pp. 1939-1948, 2020.
[10]
S. F. Hou, Y. F. Ye, Y. Q. Song, and M. Abdulhayoglu, HinDroid: An intelligent android malware detection system based on structured heterogeneous information network, in Proc. 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Halifax, Canada, 2017, pp. 1507-1515.
[11]
X. L. Zhang, I. Baggili, and F. Breitinger, Breaking into the vault: Privacy, security and forensic analysis of Android vault applications, Comput. Secur., vol. 70, pp. 516-531, 2017.
[12]
Y. X. Wang and Y. J. Zhang, Nonnegative matrix factorization: A comprehensive review, IEEE Trans. Knowl. Data Eng., vol. 25, no. 6, pp. 1336-1353, 2013.
[13]
J. Yoo and S. Choi, Probabilistic matrix tri-factorization, presented at 2009 IEEE Int. Conf. Acoustics, Speech and Signal Proc., Taipei, China, 2009, pp. 1553-1556.
[14]
L. Hu, Y. H. Xing, Y. L. Gong, K. Zhao, and F. Wang, Nonnegative matrix tri-factorization with user similarity for clustering in point-of-interest, Neurocomputing, vol. 363, pp. 58-65, 2019.
[15]
Y. Z. Sun and J. W. Han. Meta-path-based search and mining in heterogeneous information networks, Tsinghua Science and Technology, vol. 18, no. 4, pp. 329-338, 2013.
[16]
C. Ding, T. Li, W. Peng, and H. Park, Orthogonal nonnegative matrix t-factorizations for clustering, in Proc. 12th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 2006, pp. 126-135.
[17]
B. Long, Z. M. Zhang, and P. S. Yu, Co-clustering by block value decomposition, in Proc. 11th ACM SIGKDD Int. Conf. Knowledge Discovery in Data Mining,  Chicago,  IL, USA, 2005, pp. 635-640.
[18]
I. S. Dhillon, Co-clustering documents and words using bipartite spectral graph partitioning, in Proc. 7th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2001, pp. 269-274.
[19]
H. B. Deng, J. W. Han, B. Zhao, Y. T. Yu, and C. X. Lin, Probabilistic topic models with biased propagation on heterogeneous information networks, in Proc. 17th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Diego, CA, USA, 2011, pp. 1271-1279.
[20]
D. Cai, X. He, and J. Han, Document clustering using locality preserving indexing, IEEE Trans. Knowl. Data Eng., vol. 17, no. 12, pp. 1624-1637, 2005.
Tsinghua Science and Technology
Pages 386-395
Cite this article:
Hu J, Xing Y, Han M, et al. Nonnegative Matrix Tri-Factorization Based Clustering in a Heterogeneous Information Network with Star Network Schema. Tsinghua Science and Technology, 2022, 27(2): 386-395. https://doi.org/10.26599/TST.2020.9010049

907

Views

85

Downloads

6

Crossref

5

Web of Science

8

Scopus

0

CSCD

Altmetrics

Received: 20 September 2020
Accepted: 09 October 2020
Published: 29 September 2021
© The author(s) 2022

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return