By combining multiple weak learners with concept drift in the classification of big data stream learning, the ensemble learning can achieve better generalization performance than the single learning approach. In this paper, we present an efficient classifier using the online bagging ensemble method for big data stream learning. In this classifier, we introduce an efficient online resampling mechanism on the training instances, and use a robust coding method based on error-correcting output codes. This is done in order to reduce the effects of correlations between the classifiers and increase the diversity of the ensemble. A dynamic updating model based on classification performance is adopted to reduce the unnecessary updating operations and improve the efficiency of learning. We implement a parallel version of EoBag, which runs faster than the serial version, and results indicate that the classification performance is almost the same as the serial one. Finally, we compare the performance of classification and the usage of resources with other state-of-the-art algorithms using the artificial and the actual data sets, respectively. Results show that the proposed algorithm can obtain better accuracy and more feasible usage of resources for the classification of big data stream.
- Article type
- Year
- Co-author
As one of the key operations in Wireless Sensor Networks (WSNs), the energy-efficient data collection schemes have been actively explored in the literature. However, the transform basis for sparsifing the sensed data is usually chosen empirically, and the transformed results are not always the sparsest. In this paper, we propose a Data Collection scheme based on Denoising Autoencoder (DCDA) to solve the above problem. In the data training phase, a Denoising AutoEncoder (DAE) is trained to compute the data measurement matrix and the data reconstruction matrix using the historical sensed data. Then, in the data collection phase, the sensed data of whole network are collected along a data collection tree. The data measurement matrix is utilized to compress the sensed data in each sensor node, and the data reconstruction matrix is utilized to reconstruct the original data in the sink. Finally, the data communication performance and data reconstruction performance of the proposed scheme are evaluated and compared with those of existing schemes using real-world sensed data. The experimental results show that compared to its counterparts, the proposed scheme results in a higher data compression rate, lower energy consumption, more accurate data reconstruction, and faster data reconstruction speed.