AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (597.2 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research paper | Open Access

Application of biclustering algorithm to extract rules from labeled data

Zhang Yanjie(

), Sun Hongbo

School of Computer and Control Engineering, Yantai University, Yantai, China

Show Author Information

Abstract

Purpose

For many pattern recognition problems, the relation between the sample vectors and the class labels are known during the data acquisition procedure. However, how to find the useful rules or knowledge hidden in the data is very important and challengeable. Rule extraction methods are very useful in mining the important and heuristic knowledge hidden in the original high-dimensional data. It can help us to construct predictive models with few attributes of the data so as to provide valuable model interpretability and less training times.

Design/methodology/approach

In this paper, a novel rule extraction method with the application of biclustering algorithm is proposed.

Findings

To choose the most significant biclusters from the huge number of detected biclusters, a specially modified information entropy calculation method is also provided. It will be shown that all of the important knowledge is in practice hidden in these biclusters.

Originality/value

The novelty of the new method lies in the detected biclusters can be conveniently translated into if-then rules. It provides an intuitively explainable and comprehensive approach to extract rules from high-dimensional data while keeping high classification accuracy.

Keywords

Rule extraction Biclustering algorithm Crowdsourced big data and analytics

References

Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Hudson, J., Jr, Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Levy, R., Wilson, W., Grever, M.R., Byrd, J.C., Botstein, D., Brown, P.O. and Staudt, L.M. (2000), “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling”, Nature, Vol. 403 No. 6769, pp. 503-511.

Crossref Google Scholar

Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D. and Levine, A.J. (1999), “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays”, Proceedings of the National Academy of Sciences of the United States of America, Vol. 96 No. 12, p. 6745.https://doi.org/10.1073/pnas.96.12.6745

Crossref

Amela, P., Prelić, B., Philip, Z., Anja, W., Peter, B. and Wilhelm, G. (2006), “A systematic comparison and evaluation of biclustering methods for gene expression data”, Bioinformatics, Vol. 22 No. 9, pp. 1122-1129.

Crossref Google Scholar

Chen, L., Sun, Y. and Zhu, Y. (2015), “Extraction methods for uncertain inference rules by ant colony optimization”, Journal of Uncertainty Analysis and Applications, Vol. 3 No. 1, pp. 1-19.

Crossref Google Scholar

Cheng, Y. and Church, G.M. (2000), “Biclustering of expression data”, 8th International Conference on Intelligent Systems for Molecular Biology 2000, Vol. 8, pp. 93-103.

Czibula, G., Czibula, I.G., Sîrbu, A.M. and Mircea, I.G. (2015), “A novel approach to adaptive relational association rule mining”, Applied Soft Computing, Vol. 36, pp. 519-533.

Crossref Google Scholar

Dahal, K., Almejalli, K., Hossain, M.A. and Chen, W. (2015), “Ga-based learning for rule identification in fuzzy neural networks”, Applied Soft Computing, Vol. 35, pp. 605-617.

Crossref Google Scholar

Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D. and Lander, E.S. (1999), “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring”,Science, Vol. 286 No. 5439, pp. 205-214.

Google Scholar

Gorzałczany, M.B. and Rudziński, F. (2017), “Interpretable and accurate medical data classification – a multi-objective genetic-fuzzy optimization approach”, Expert Systems with Applications, Vol. 71, pp. 26-39.

Crossref Google Scholar

Han, L., Luo, S., Yu, J., Pan, L. and Chen, S. (2015), “Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes”, IEEE Journal of Biomedical and Health Informatics, Vol. 19 No. 2, pp. 728-734.

Crossref Google Scholar

Indira, K. and Kanmani, S. (2015), “Association rule mining through adaptive parameter control in particle swarm optimization”, Computational Statistics, Vol. 30 No. 1, pp. 251-277.

Crossref Google Scholar

Kaiser, S. and Leisch, F. (2008), “A toolbox for bicluster analysis in R”, Department of Statistics: Technical Reports, available at: http://epub.ub.uni-muenchen.de/3293/

Kurgan, L.A. and Cios, K.J. (2004), “Caim discretization algorithm”, IEEE Transactions on Knowledge and Data Engineering, Vol. 16 No. 2, pp. 145-153.

Crossref Google Scholar

Madeira, S.C. and Oliveira, A.L. (2004), “Biclustering algorithms for biological data analysis: a survey”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 1 No. 1, pp. 24-45.

Crossref Google Scholar

Maulik, U., Mallik, S., Mukhopadhyay, A. and Bandyopadhyay, S. (2015), “Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining”, Plos One, Vol. 10 No. 4.

Crossref Google Scholar

Núñez, H., Angulo, C. and Català, A. (2002), “Rule extraction from support vector machines”, Eurorean Symposium on Artificial Neural Networks, Bruges, Vol. 80, pp. 107-112.

Oliveira, J.V.D. (1999), “Semantic constraints for membership function optimization”, IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans, Vol. 29 No. 1, pp. 128-138.

Crossref Google Scholar

Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S. and Golub, T.R. (2002), “Prediction of Central nervous system embryonal tumour outcome based on gene expression”, Nature, Vol. 415 No. 6870, p. 436.

Google Scholar

Rabia, A., Verma, C.K. and Namita, S. (2016), “A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data”, Genomics Data, Vol. 8, pp. 4-15.

Crossref Google Scholar

Roubos, H., Setnes, M. and Abonyi, J. (2003), “Learning fuzzy classiﬁcation rules from labeled data”, Information Sciences, Vol. 150 Nos 1/2, pp. 77-93.

Crossref Google Scholar

Shi, Y., Zhang, L., Tian, Y. and Li, X. (2015), “Knowledge extraction from support vector machines”, Intelligent Knowledge, pp. 101-111.https://doi.org/10.1007/978-3-662-46193-8_6

Crossref

Shinde, S. and Kulkarni, U. (2016), “Extracting classification rules from modified fuzzy min–max neural network for data with mixed attributes”, Applied Soft Computing, Vol. 40, pp. 364-378.

Crossref Google Scholar

Shrivastava, A. and Barua, K. (2015), “An efficient tree based algorithm for association rule mining”, International Journal of Computer Applications, Vol. 117 No. 11, pp. 31-32.

Crossref Google Scholar

Tsai, C.J., Lee, C.I. and Yang, W.P. (2008), “A discretization algorithm based on class-attribute contingency coefficient”, Information Sciences, Vol. 178 No. 3, pp. 714-731.

Crossref Google Scholar

Valarmathi, M.L., Siji, P.D. and Mohana, S. (2015), “Efficient association rule mining based on correlation analysis”, International Journal of Applied Engineering Research, Vol. 10 No. 11, pp. 29367-29384.

Google Scholar

Wang, H.Q., Jing, G.J. and Zheng, C. (2014), “Biology-constrained gene expression discretization for cancer classification”, Neurocomputing, Vol. 145 No. 18, pp. 30-36.

Crossref Google Scholar

International Journal of Crowd Science

Volume 2 Issue 2,
November 2018

Pages 86-98

DOI: 10.1108/IJCS-01-2018-0002

Cite this article:

Yanjie Z, Hongbo S. Application of biclustering algorithm to extract rules from labeled data. International Journal of Crowd Science, 2018, 2(2): 86-98. https://doi.org/10.1108/IJCS-01-2018-0002

551

Views

Downloads

Crossref

Scopus

Google Scholar
Citation

Altmetrics

Received: 25 January 2018

Revised: 10 April 2018

Accepted: 12 April 2018

Published: 07 June 2018

Zhang Yanjie and Sun Hongbo. Published in International Journal of Crowd Science. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode