Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
Many real-world datasets suffer from the unavoidable issue of missing values, and therefore classification with missing data has to be carefully handled since inadequate treatment of missing values will cause large errors. In this paper, we propose a random subspace sampling method, RSS, by sampling missing items from the corresponding feature histogram distributions in random subspaces, which is effective and efficient at different levels of missing data. Unlike most established approaches, RSS does not train on fixed imputed datasets. Instead, we design a dynamic training strategy where the filled values change dynamically by resampling during training. Moreover, thanks to the sampling strategy, we design an ensemble testing strategy where we combine the results of multiple runs of a single model, which is more efficient and resource-saving than previous ensemble methods. Finally, we combine these two strategies with the random subspace method, which makes our estimations more robust and accurate. The effectiveness of the proposed RSS method is well validated by experimental studies.
García-Laencina P J, Sancho-Gómez J L, Figueiras-Vidal A R. Pattern classification with missing data: A review. Neural Computing and Applications , 2010, 19(2): 263–282. DOI: 10.1007/s00521-009-0295-6.
White I R, Royston P, Wood A M. Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine , 2011, 30(4): 377–399. DOI: 10.1002/ sim.4067.
Farhangfar A, Kurgan L A, Pedrycz W. A novel framework for imputation of missing values in databases. IEEE Trans. Systems, Man, and Cybernetics—Part A : Systems and Humans , 2007, 37(5): 692–709. DOI: 10.1109/TSMCA.2007.902631.
Polikar R, DePasquale J, Syed Mohammed H, Brown G, Kuncheva L I. Learn++. MF: A random subspace approach for the missing feature problem. Pattern Recognition , 2010, 43(11): 3817–3832. DOI: 10.1016/j.patcog.2010.05.028.
Stekhoven D J, Bühlmann P. MissForest—Non-parametric missing value imputation for mixed-type data. Bioinformatics , 2012, 28(1): 112–118. DOI: 10.1093/bioinformatics/btr597.
Ho T K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Analysis and Machine Intelligence , 1998, 20(8): 832–844. DOI: 10.1109/34.709601.
Breiman L. Random forests. Machine Learning , 2001, 45(1): 5–32. DOI: 10.1023/A:1010933404324.
Sharpe P K, Solly R J. Dealing with missing values in neural network-based diagnostic systems. Neural Computing & Applications , 1995, 3(2): 73–77. DOI: 10.1007/BF 01421959.
Cao Y H, Wu J X, Wang H C, Lasenby J. Neural random subspace. Pattern Recognition , 2021, 112: Article No. 107801. DOI: 10.1016/j.patcog.2020.107801.
Mazumder R, Hastie T, Tibshirani R. Spectral regularization algorithms for learning large incomplete matrices. The Journal of Machine Learning Research , 2010, 11(80): 2287–2322.