AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (597.2 KB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research paper | Open Access

Application of biclustering algorithm to extract rules from labeled data

Zhang Yanjie( )Sun Hongbo
School of Computer and Control Engineering, Yantai University, Yantai, China
Show Author Information

Abstract

Purpose

For many pattern recognition problems, the relation between the sample vectors and the class labels are known during the data acquisition procedure. However, how to find the useful rules or knowledge hidden in the data is very important and challengeable. Rule extraction methods are very useful in mining the important and heuristic knowledge hidden in the original high-dimensional data. It can help us to construct predictive models with few attributes of the data so as to provide valuable model interpretability and less training times.

Design/methodology/approach

In this paper, a novel rule extraction method with the application of biclustering algorithm is proposed.

Findings

To choose the most significant biclusters from the huge number of detected biclusters, a specially modified information entropy calculation method is also provided. It will be shown that all of the important knowledge is in practice hidden in these biclusters.

Originality/value

The novelty of the new method lies in the detected biclusters can be conveniently translated into if-then rules. It provides an intuitively explainable and comprehensive approach to extract rules from high-dimensional data while keeping high classification accuracy.

References

 

Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Hudson, J., Jr, Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Levy, R., Wilson, W., Grever, M.R., Byrd, J.C., Botstein, D., Brown, P.O. and Staudt, L.M. (2000), “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling”, Nature, Vol. 403 No. 6769, pp. 503-511.

 
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D. and Levine, A.J. (1999), “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays”, Proceedings of the National Academy of Sciences of the United States of America, Vol. 96 No. 12, p. 6745.https://doi.org/10.1073/pnas.96.12.6745
 

Amela, P., Prelić, B., Philip, Z., Anja, W., Peter, B. and Wilhelm, G. (2006), “A systematic comparison and evaluation of biclustering methods for gene expression data”, Bioinformatics, Vol. 22 No. 9, pp. 1122-1129.

 

Chen, L., Sun, Y. and Zhu, Y. (2015), “Extraction methods for uncertain inference rules by ant colony optimization”, Journal of Uncertainty Analysis and Applications, Vol. 3 No. 1, pp. 1-19.

 
Cheng, Y. and Church, G.M. (2000), “Biclustering of expression data”, 8th International Conference on Intelligent Systems for Molecular Biology 2000, Vol. 8, pp. 93-103.
 

Czibula, G., Czibula, I.G., Sîrbu, A.M. and Mircea, I.G. (2015), “A novel approach to adaptive relational association rule mining”, Applied Soft Computing, Vol. 36, pp. 519-533.

 

Dahal, K., Almejalli, K., Hossain, M.A. and Chen, W. (2015), “Ga-based learning for rule identification in fuzzy neural networks”, Applied Soft Computing, Vol. 35, pp. 605-617.

 

Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D. and Lander, E.S. (1999), “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring”,Science, Vol. 286 No. 5439, pp. 205-214.

 

Gorzałczany, M.B. and Rudziński, F. (2017), “Interpretable and accurate medical data classification – a multi-objective genetic-fuzzy optimization approach”, Expert Systems with Applications, Vol. 71, pp. 26-39.

 

Han, L., Luo, S., Yu, J., Pan, L. and Chen, S. (2015), “Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes”, IEEE Journal of Biomedical and Health Informatics, Vol. 19 No. 2, pp. 728-734.

 

Indira, K. and Kanmani, S. (2015), “Association rule mining through adaptive parameter control in particle swarm optimization”, Computational Statistics, Vol. 30 No. 1, pp. 251-277.

 
Kaiser, S. and Leisch, F. (2008), “A toolbox for bicluster analysis in R”, Department of Statistics: Technical Reports, available at: http://epub.ub.uni-muenchen.de/3293/
 

Kurgan, L.A. and Cios, K.J. (2004), “Caim discretization algorithm”, IEEE Transactions on Knowledge and Data Engineering, Vol. 16 No. 2, pp. 145-153.

 

Madeira, S.C. and Oliveira, A.L. (2004), “Biclustering algorithms for biological data analysis: a survey”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 1 No. 1, pp. 24-45.

 

Maulik, U., Mallik, S., Mukhopadhyay, A. and Bandyopadhyay, S. (2015), “Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining”, Plos One, Vol. 10 No. 4.

 
Núñez, H., Angulo, C. and Català, A. (2002), “Rule extraction from support vector machines”, Eurorean Symposium on Artificial Neural Networks, Bruges, Vol. 80, pp. 107-112.
 

Oliveira, J.V.D. (1999), “Semantic constraints for membership function optimization”, IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans, Vol. 29 No. 1, pp. 128-138.

 

Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S. and Golub, T.R. (2002), “Prediction of Central nervous system embryonal tumour outcome based on gene expression”, Nature, Vol. 415 No. 6870, p. 436.

 

Rabia, A., Verma, C.K. and Namita, S. (2016), “A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data”, Genomics Data, Vol. 8, pp. 4-15.

 

Roubos, H., Setnes, M. and Abonyi, J. (2003), “Learning fuzzy classification rules from labeled data”, Information Sciences, Vol. 150 Nos 1/2, pp. 77-93.

 
Shi, Y., Zhang, L., Tian, Y. and Li, X. (2015), “Knowledge extraction from support vector machines”, Intelligent Knowledge, pp. 101-111.https://doi.org/10.1007/978-3-662-46193-8_6
 

Shinde, S. and Kulkarni, U. (2016), “Extracting classification rules from modified fuzzy min–max neural network for data with mixed attributes”, Applied Soft Computing, Vol. 40, pp. 364-378.

 

Shrivastava, A. and Barua, K. (2015), “An efficient tree based algorithm for association rule mining”, International Journal of Computer Applications, Vol. 117 No. 11, pp. 31-32.

 

Tsai, C.J., Lee, C.I. and Yang, W.P. (2008), “A discretization algorithm based on class-attribute contingency coefficient”, Information Sciences, Vol. 178 No. 3, pp. 714-731.

 

Valarmathi, M.L., Siji, P.D. and Mohana, S. (2015), “Efficient association rule mining based on correlation analysis”, International Journal of Applied Engineering Research, Vol. 10 No. 11, pp. 29367-29384.

 

Wang, H.Q., Jing, G.J. and Zheng, C. (2014), “Biology-constrained gene expression discretization for cancer classification”, Neurocomputing, Vol. 145 No. 18, pp. 30-36.

International Journal of Crowd Science
Pages 86-98
Cite this article:
Yanjie Z, Hongbo S. Application of biclustering algorithm to extract rules from labeled data. International Journal of Crowd Science, 2018, 2(2): 86-98. https://doi.org/10.1108/IJCS-01-2018-0002

551

Views

11

Downloads

1

Crossref

1

Scopus

Altmetrics

Received: 25 January 2018
Revised: 10 April 2018
Accepted: 12 April 2018
Published: 07 June 2018
© The author(s)

Zhang Yanjie and Sun Hongbo. Published in International Journal of Crowd Science. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

Return