AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (668.5 KB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Efficient Leave-One-Out Strategy for Supervised Feature Selection

National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
Show Author Information

Abstract

Feature selection is a key task in statistical pattern recognition. Most feature selection algorithms have been proposed based on specific objective functions which are usually intuitively reasonable but can sometimes be far from the more basic objectives of the feature selection. This paper describes how to select features such that the basic objectives, e.g., classification or clustering accuracies, can be optimized in a more direct way. The analysis requires that the contribution of each feature to the evaluation metrics can be quantitatively described by some score function. Motivated by the conditional independence structure in probabilistic distributions, the analysis uses a leave-one-out feature selection algorithm which provides an approximate solution. The leave-one-out algorithm improves the conventional greedy backward elimination algorithm by preserving more interactions among features in the selection process, so that the various feature selection objectives can be optimized in a unified way. Experiments on six real-world datasets with different feature evaluation metrics have shown that this algorithm outperforms popular feature selection algorithms in most situations.

References

[1]
I. Guyon and A. Elisseeff, An introduction to variable and feature selection, J. Mach. Learn. Res., vol. 3, nos. 7-8, pp. 1157-1182, Oct. 2003.
[2]
H. Peng, F. Long, and C. Ding, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226-1238, Aug. 2005.
[3]
M. C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
[4]
J. Li and M. Sun, Non-independent term selection for Chinese text categorization, Tsinghua Science and Technology, vol.14, no.1, pp. 113-120, Feb. 2009.
[5]
F. Nie, H. Huang, X. Cai, and C. Ding, Efficient and robust feature selection via joint l2, 1-norms minimization, in Advances in Neural Information Processing Systems 23, Vancouver, BC, Canada, 2010, pp. 1813-1821.
[6]
Z. Zhao, L. Wang, H. Liu, and J. Ye, On similarity preserving feature selection, IEEE Trans. Knowl. Data Eng., vol. 25, no. 3, pp. 619-632, Mar. 2013.
[7]
C. Hou, F. Nie, D. Yi, and Y. Wu, Feature selection via joint embedding learning and sparse regression, in Proc. 22nd Int. Joint Conf. on Artificial Intelligence, Barcelona, Spain, 2011, pp. 1324-1329.
[8]
S. Xiang, F. Nie, G. Meng, C. Pan, and C. Zhang, Discriminative least squares regression for multiclass classification and feature selection, IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 11, pp. 1738-1754, Nov. 2012.
[9]
M. Belkin and M. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., vo. 15, no. 6, pp. 1373-1396, Jun 2003.
[10]
S. Roweis and L. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science, vol. 290, no. 5500, pp. 2323-2326, Dec. 2000.
[11]
R. Duda, P. Hart, and D. Stork, Pattern Classification, Wiley-Interscience, 2001.
[12]
X. He, D. Cai, and P. Niyogi, Laplacian score for feature selection, in Advances in Neural Information Processing Systems 18, Vancouver, BC, Canada, 2006, pp. 507-514.
[13]
I. Kononenko, Estimating attributes: Analysis and extensions of relief, in Machine Learning: ECML-94, Springer, 1994, pp. 171-182.
[14]
Z. Zhao and H. Liu, Spectral feature selection for supervised and unsupervised learning, in Proc. 24th Int. Conf. on Machine Learning, Corvallis, USA, 2007, pp. 1151-1157.
[15]
F. Nie, S. Xiang, Y. Jia, C. Zhang, and S. Yan, Trace ratio criterion for feature selection, in Proc. 23rd AAAI Conf. on Artificial Intelligence, Chicago, USA, 2008, pp. 671-676.
[16]
D. Cai, C. Zhang, and X. He, Unsupervised feature selection for multi-cluster data, in Proc. 16th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Washington DC, USA, 2010, pp. 333-342.
[17]
J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988.
[18]
Z. Zhao, L. Wang, and H. Liu, Efficient spectral feature selection with minimum redundancy, in Proc. 24th AAAI Conf. on Artificial Intelligence, Atlanta, USA, 2010, pp. 673-678
Tsinghua Science and Technology
Pages 629-635
Cite this article:
Feng D, Chen F, Xu W. Efficient Leave-One-Out Strategy for Supervised Feature Selection. Tsinghua Science and Technology, 2013, 18(6): 629-635. https://doi.org/10.1109/TST.2013.6678908

702

Views

24

Downloads

12

Crossref

N/A

Web of Science

11

Scopus

0

CSCD

Altmetrics

Received: 15 October 2012
Revised: 05 June 2013
Accepted: 07 June 2013
Published: 06 December 2013
© The author(s) 2013
Return