AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (838.4 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research paper | Open Access

RegBoost: a gradient boosted multivariate regression algorithm

Wen Li(

), Wei Wang, Wenjun Huo

College of Electronics and Information Engineering, Tongji University, Shanghai, China

Show Author Information

Abstract

Purpose

Inspired by the basic idea of gradient boosting, this study aims to design a novel multivariate regression ensemble algorithm RegBoost by using multivariate linear regression as a weak predictor.

Design/methodology/approach

To achieve nonlinearity after combining all linear regression predictors, the training data is divided into two branches according to the prediction results using the current weak predictor. The linear regression modeling is recursively executed in two branches. In the test phase, test data is distributed to a specific branch to continue with the next weak predictor. The final result is the sum of all weak predictors across the entire path.

Findings

Through comparison experiments, it is found that the algorithm RegBoost can achieve similar performance to the gradient boosted decision tree (GBDT). The algorithm is very effective compared to linear regression.

Originality/value

This paper attempts to design a novel regression algorithm RegBoost with reference to GBDT. To the best of the knowledge, for the first time, RegBoost uses linear regression as a weak predictor, and combine with gradient boosting to build an ensemble algorithm.

Keywords

Ensemble learning Linear regression Gradient boosting RMSE

References

archive.ics (2019a), available at: https://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+Structure

archive.ics (2019b), available at: https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant

archive.ics (2019c), available at: https://archive.ics.uci.edu/ml/datasets/Superconductivty+Data

Bin, L., X-J., Wang, R-T. and Zhong Z., (2006), “Continuous optimization based-on boosting Gaussian mixture model”, 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, 2006, pp. 1192-1195.https://doi.org/10.1109/ICPR.2006.412

Crossref

Breiman, L.A. (1997), The Edge. Technical Report 486, Statistics Department, University of CA, Berkeley.

Chen, T. and Guestrin, C. (2016), “XGBoost: a scalable tree boosting system”, in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 785-794.https://doi.org/10.1145/2939672.2939785

Crossref

Dorogush, A.V., Vasily, E. and Andrey, G. (2017), CatBoost: gradient Boosting with Categorical Features Support, Workshop on ML Systems at NIPS.

Dubossarsky, E., Friedman, J.H., Ormerod, J.T. and Wand, M.P. (2016), “Wavelet-based gradient boosting”, Statistics and Computing, Vol. 26 Nos 1/2, pp. 93-105.

Crossref Google Scholar

Friedman, J.H. (2001), “Greedy function approximation: a gradient boosting machine”, The Annals of Statistics, Vol. 29 No. 5, pp. 1189-1232.

Crossref Google Scholar

Friedman, J.H. (2002), “Stochastic gradient boosting”, Computational Statistics and Data Analysis, Vol. 38 No. 4, pp. 367-378.

Crossref Google Scholar

Hocking, R.R. (1976), “The analysis and selection of variables in linear regression”, Biometrics, p. 32.

Crossref Google Scholar

Hu, R., Li, X. and Zhao, Y. (2006), “Gradient boosting learning of hidden Markov models”, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, p. 1.

Kaya, H., Tüfekci, P. and Sadık, F.G. (2012), “Local and global learning methods for predicting power of a combined gas and steam turbine”, Proceedings of the International Conference on Emerging Trends in Computer and Electronics Engineering ICETCEE, (March. 2012, Dubai), pp. 13-18

Ke, G., Meng, Q., Finley, T., Wang T., Chen, W., Ma, W. Ye Q. and Liu, T.-Y. (2017), “LightGBM: a highly efﬁcient gradient boosting decision tree”,31st Conference on Neural Information Processing Systems, Long Beach, CA.

Kenji, N. and Kurita, T. (2005), “Boosting Soft-Margin SVM with feature selection for pedestrian detection”, International Workshop on Multiple Classifier Systems Springer Berlin Heidelberg.

Liu, X., Campbell, D. and Guo, Z. (2017), “Single image density map estimation based on multi-column CNN and boosting”, IEEE Global Conference on Signal and Information Processing (GlobalSIP), Montreal, QC, pp. 1393-1396.https://doi.org/10.1109/GlobalSIP.2017.8309190

Crossref

Pontius, R., Thontteh, O. and Chen, H. (2008), “Components of information for multiple resolution comparison between maps that share a real variable”, Environmental and Ecological Statistics, Vol. 15 No. 2, pp. 111-142, doi: 10.1007/s10651-007-0043-y.

Crossref Google Scholar

Tüfekci, P. (2014), “Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods”, International Journal of Electrical Power and Energy Systems, Vol. 60, pp. 126-140, ISSN 0142-0615.

Crossref Google Scholar

Willmott, C.J. and Matsuura, K. (2005), “Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance”, Climate Research, Vol. 30, pp. 79-82.

Crossref Google Scholar

Willmott, C. and Matsuura, K. (2006), “On the use of dimensioned measures of error to evaluate the performance of spatial interpolators”,International Journal of Geographical Information Science, Vol. 20 No. 1, pp. 89-102.

Crossref Google Scholar

Zhang, F., Du, B. and Zhang, L. (2016), “Scene classification via a gradient boosting random convolutional network framework”, in IEEE Transactions on Geoscience and Remote Sensing, Vol. 54 No. 3, pp. 1793-1802.

Crossref Google Scholar

International Journal of Crowd Science

Volume 4 Issue 1,
March 2020

Pages 60-72

DOI: 10.1108/IJCS-10-2019-0029

Cite this article:

Li W, Wang W, Huo W. RegBoost: a gradient boosted multivariate regression algorithm. International Journal of Crowd Science, 2020, 4(1): 60-72. https://doi.org/10.1108/IJCS-10-2019-0029

717

Views

Downloads

Crossref

Scopus

Google Scholar
Citation

Altmetrics

Received: 14 October 2019

Accepted: 25 December 2019

Published: 03 February 2020

Wen Li, Wei Wang and Wenjun Huo. Published in International Journal of Crowd Science. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode