AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (838.4 KB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research paper | Open Access

RegBoost: a gradient boosted multivariate regression algorithm

Wen Li( )Wei WangWenjun Huo
College of Electronics and Information Engineering, Tongji University, Shanghai, China
Show Author Information

Abstract

Purpose

Inspired by the basic idea of gradient boosting, this study aims to design a novel multivariate regression ensemble algorithm RegBoost by using multivariate linear regression as a weak predictor.

Design/methodology/approach

To achieve nonlinearity after combining all linear regression predictors, the training data is divided into two branches according to the prediction results using the current weak predictor. The linear regression modeling is recursively executed in two branches. In the test phase, test data is distributed to a specific branch to continue with the next weak predictor. The final result is the sum of all weak predictors across the entire path.

Findings

Through comparison experiments, it is found that the algorithm RegBoost can achieve similar performance to the gradient boosted decision tree (GBDT). The algorithm is very effective compared to linear regression.

Originality/value

This paper attempts to design a novel regression algorithm RegBoost with reference to GBDT. To the best of the knowledge, for the first time, RegBoost uses linear regression as a weak predictor, and combine with gradient boosting to build an ensemble algorithm.

References

 
Bin, L., X-J., Wang, R-T. and Zhong Z., (2006), “Continuous optimization based-on boosting Gaussian mixture model”, 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, 2006, pp. 1192-1195.https://doi.org/10.1109/ICPR.2006.412
 
Breiman, L.A. (1997), The Edge. Technical Report 486, Statistics Department, University of CA, Berkeley.
 
Chen, T. and Guestrin, C. (2016), “XGBoost: a scalable tree boosting system”, in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 785-794.https://doi.org/10.1145/2939672.2939785
 
Dorogush, A.V., Vasily, E. and Andrey, G. (2017), CatBoost: gradient Boosting with Categorical Features Support, Workshop on ML Systems at NIPS.
 

Dubossarsky, E., Friedman, J.H., Ormerod, J.T. and Wand, M.P. (2016), “Wavelet-based gradient boosting”, Statistics and Computing, Vol. 26 Nos 1/2, pp. 93-105.

 

Friedman, J.H. (2001), “Greedy function approximation: a gradient boosting machine”, The Annals of Statistics, Vol. 29 No. 5, pp. 1189-1232.

 

Friedman, J.H. (2002), “Stochastic gradient boosting”, Computational Statistics and Data Analysis, Vol. 38 No. 4, pp. 367-378.

 

Hocking, R.R. (1976), “The analysis and selection of variables in linear regression”, Biometrics, p. 32.

 
Hu, R., Li, X. and Zhao, Y. (2006), “Gradient boosting learning of hidden Markov models”, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, p. 1.
 
Kaya, H., Tüfekci, P. and Sadık, F.G. (2012), “Local and global learning methods for predicting power of a combined gas and steam turbine”, Proceedings of the International Conference on Emerging Trends in Computer and Electronics Engineering ICETCEE, (March. 2012, Dubai), pp. 13-18
 
Ke, G., Meng, Q., Finley, T., Wang T., Chen, W., Ma, W. Ye Q. and Liu, T.-Y. (2017), “LightGBM: a highly efficient gradient boosting decision tree”,31st Conference on Neural Information Processing Systems, Long Beach, CA.
 
Kenji, N. and Kurita, T. (2005), “Boosting Soft-Margin SVM with feature selection for pedestrian detection”, International Workshop on Multiple Classifier Systems Springer Berlin Heidelberg.
 
Liu, X., Campbell, D. and Guo, Z. (2017), “Single image density map estimation based on multi-column CNN and boosting”, IEEE Global Conference on Signal and Information Processing (GlobalSIP), Montreal, QC, pp. 1393-1396.https://doi.org/10.1109/GlobalSIP.2017.8309190
 

Pontius, R., Thontteh, O. and Chen, H. (2008), “Components of information for multiple resolution comparison between maps that share a real variable”, Environmental and Ecological Statistics, Vol. 15 No. 2, pp. 111-142, doi: 10.1007/s10651-007-0043-y.

 

Tüfekci, P. (2014), “Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods”, International Journal of Electrical Power and Energy Systems, Vol. 60, pp. 126-140, ISSN 0142-0615.

 

Willmott, C.J. and Matsuura, K. (2005), “Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance”, Climate Research, Vol. 30, pp. 79-82.

 

Willmott, C. and Matsuura, K. (2006), “On the use of dimensioned measures of error to evaluate the performance of spatial interpolators”,International Journal of Geographical Information Science, Vol. 20 No. 1, pp. 89-102.

 

Zhang, F., Du, B. and Zhang, L. (2016), “Scene classification via a gradient boosting random convolutional network framework”, in IEEE Transactions on Geoscience and Remote Sensing, Vol. 54 No. 3, pp. 1793-1802.

International Journal of Crowd Science
Pages 60-72
Cite this article:
Li W, Wang W, Huo W. RegBoost: a gradient boosted multivariate regression algorithm. International Journal of Crowd Science, 2020, 4(1): 60-72. https://doi.org/10.1108/IJCS-10-2019-0029

717

Views

48

Downloads

5

Crossref

7

Scopus

Altmetrics

Received: 14 October 2019
Accepted: 25 December 2019
Published: 03 February 2020
© The author(s)

Wen Li, Wei Wang and Wenjun Huo. Published in International Journal of Crowd Science. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

Return