AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Semi-Supervised Classification of Data Streams by BIRCH Ensemble and Local Structure Mapping

Guangxi Key Laboratory of Image and Graphic Intelligent Processing, Guilin University of Electronic Technology Guilin 541004, China
Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China
School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
Show Author Information

Abstract

Many researchers have applied clustering to handle semi-supervised classification of data streams with concept drifts. However, the generalization ability for each specific concept cannot be steadily improved, and the concept drift detection method without considering the local structural information of data cannot accurately detect concept drifts. This paper proposes to solve these problems by BIRCH (Balanced Iterative Reducing and Clustering Using Hierarchies) ensemble and local structure mapping. The local structure mapping strategy is utilized to compute local similarity around each sample and combined with semi-supervised Bayesian method to perform concept detection. If a recurrent concept is detected, a historical BIRCH ensemble classifier is selected to be incrementally updated; otherwise a new BIRCH ensemble classifier is constructed and added into the classifier pool. The extensive experiments on several synthetic and real datasets demonstrate the advantage of the proposed algorithm.

Electronic Supplementary Material

Download File(s)
jcst-35-2-295-Highlights.pdf (508.7 KB)

References

[1]

Liu Q, Ma H P, Chen E H, Xiong H. A survey of context-aware mobile recommendations. International Journal of Information Technology & Decision Making, 2013, 12(1): 139-172.

[2]

Li Y, Si J, Zhou G J, Chen S C. FREL: A stable feature selection algorithm. IEEE Transactions on Neural Networks and Learning Systems, 2014, 26(7): 1388-1402.

[3]

Peng Y, Lu B L. Discriminative extreme learning machine with supervised sparsity preserving for image classification. Neurocomputing, 2017, 261: 242-252.

[4]

Li Y, Li T, Liu H. Recent advances in feature selection and its applications. Knowledge and Information Systems, 2017, 53(3): 551-577.

[5]

Li Y F, Liang D M. Safe semi-supervised learning: A brief introduction. Frontiers of Computer Science, 2019, 13(4): 669-676.

[6]

Noorbehbahani F, Fanian A, Mousavi S R, Hasannejad H. An incremental intrusion detection system using a new semi-supervised stream classification method. International Journal of Communication Systems, 2017, 30(4): 1-26.

[7]

Sedhai S, Sun A. Semi-supervised spam detection in Twitter stream. IEEE Transactions on Computational Social Systems, 2017, 5(1): 169-175.

[8]
Haque A, Khan L, Baron M. SAND: Semi-supervised adaptive novel class detection and classification over data stream. In Proc. the 30th AAAI Conference on Artificial Intelligence, February 2016, pp.1652-1658.
[9]
Haque A, Khan L, Baron M, Thuraisingham B M, Aggarwal C C. Efficient handling of concept drift and concept evolution over stream data. In Proc. the 32nd International Conference on Data Engineering, May 2016, pp.481-492.
[10]

Wang Y, Li T. Improving semi-supervised co-forest algorithm in evolving data streams. Applied Intelligence, 2018, 48(10): 3248-3262.

[11]

Hosseini M J, Gholipour A, Beigy H. An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowledge and Information Systems, 2016, 46(3): 567-597.

[12]

Wu X D, Li P P, Hu X G. Learning from concept drifting data streams with unlabeled data. Neurocomputing, 2012, 92: 145-155.

[13]

Li P P, Wu X D, Hu X G. Mining recurring concept drifts with limited labeled streaming data. ACM Transactions on Intelligent Systems and Technology, 2012, 3(2): Article No. 32.

[14]
Masud M M, Gao J, Khan L et al. A practical approach to classify evolving data streams: Training with limited amount of labeled data. In Proc. the 8th IEEE International Conference on Data Mining, December 2008, pp.929-934.
[15]

Masud M M, Woolam C, Gao J et al. Facing the reality of data stream classification: Coping with scarcity of labeled data. Knowledge and Information Systems, 2012, 33(1): 213-244.

[16]

Xu W H, Qin Z, Chang Y. Semi-supervised learning based ensemble classifier for stream data. Pattern Recognition and Artificial Intelligence, 2012, 25(2): 292-299. (in Chinese)

[17]
Zhang P, Zhu X Q, Tan J L, Guo L. Classifier and cluster ensembles for mining concept drifting data streams. In Proc. the 10th IEEE International Conference on Data Mining, December 2010, pp.1175-1180.
[18]

Zhang T, Ramakrishnan R, Livny M. BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1997, 1(2): 141-182.

[19]
Gao J, Fan W, Jiang J, Han J. Knowledge transfer via multiple model local structure mapping. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2008, pp.283-291.
[20]

Li Y C, Wang Y L, Liu Q et al. Incremental semi-supervised learning on streaming data. Pattern Recognition, 2019, 88: 383-396.

[21]

Zhou Z H. When semi-supervised learning meets ensemble learning. Frontiers of Electrical and Electronic Engineering in China, 2011, 6(1): 6-16.

[22]
Zhang M L, Zhou Z H. Classifier ensemble with unlabeled data. arXiv: 0909.3593, 2009. https://arxiv.org/abs/0909.3593, August 2010.
[23]

Zhang M L, Zhou Z H. Exploiting unlabeled data to enhance ensemble diversity. Data Mining and Knowledge Discovery, 2013, 26(1): 98-129.

[24]

Bifet A, Holmes G, Kirkby R, Pfahringer B. MOA: Massive online analysis. Journal of Machine Learning Research, 2010, 11: 1601-1604.

Journal of Computer Science and Technology
Pages 295-304
Cite this article:
Wen Y-M, Liu S. Semi-Supervised Classification of Data Streams by BIRCH Ensemble and Local Structure Mapping. Journal of Computer Science and Technology, 2020, 35(2): 295-304. https://doi.org/10.1007/s11390-020-9999-y

324

Views

11

Crossref

N/A

Web of Science

12

Scopus

2

CSCD

Altmetrics

Received: 28 August 2019
Revised: 20 January 2020
Published: 27 March 2020
©Institute of Computing Technology, Chinese Academy of Sciences 2020
Return