Sort:
Open Access Issue
Online Real-Time Trajectory Analysis Based on Adaptive Time Interval Clustering Algorithm
Big Data Mining and Analytics 2020, 3(2): 131-142
Published: 27 February 2020
Abstract PDF (10.5 MB) Collect
Downloads:102

With the development of Chinese international trade, real-time processing systems based on ship trajectory have been used to cluster trajectory in real-time, so that the hot zone information of a sea ship can be discovered in real-time. This technology has great research value for the future planning of maritime traffic. However, ship navigation characteristics cannot be found in real-time with a ship Automatic Identification System (AIS) positioning system, and the clustering effect based on the density grid fixed-time-interval algorithm cannot resolve the shortcomings of real-time clustering. This study proposes an adaptive time interval clustering algorithm based on density grid (called DAC-Stream). This algorithm can perform adaptive time-interval clustering according to the size of the real-time ship trajectory data stream, so that a ship’s hot zone information can be found efficiently and in real-time. Experimental results show that the DAC-Stream algorithm improves the clustering effect and accelerates data processing compared with the fixed-time-interval clustering algorithm based on density grid (called DC-Stream).

Open Access Issue
An Improved Algorithm for Optimizing MapReduce Based on Locality and Overlapping
Tsinghua Science and Technology 2018, 23(6): 744-753
Published: 15 October 2018
Abstract PDF (8.4 MB) Collect
Downloads:44

MapReduce is currently the most popular programming model for big data processing, and Hadoop is a well-known MapReduce implementation platform. However, Hadoop jobs suffer from imbalanced workloads during the reduce phase and inefficiently utilize the available computing and network resources. In some cases, these problems lead to serious performance degradation in MapReduce jobs. To resolve these problems, in this paper, we propose two algorithms, the Locality-Based Balanced Schedule (LBBS) and Overlapping-Based Resource Utilization (OBRU), that optimize the Locality-Enhanced Load Balance (LELB) and the Map, Local reduce, Shuffle, and final Reduce (MLSR) phases. The LBBS collects partition information from input data during the map phase and generates balanced schedule plans for the reduce phase. OBRU is responsible for using computing and network resources efficiently by overlapping the local reduce, shuffle, and final reduce phases. Experimental results show that the LBBS and OBRU algorithms yield significant improvements in load balancing. When LBBS and OBRU are applied, job performance increases by 15% from that of models using LELB and MLSR.

Total 2