With the development of Chinese international trade, real-time processing systems based on ship trajectory have been used to cluster trajectory in real-time, so that the hot zone information of a sea ship can be discovered in real-time. This technology has great research value for the future planning of maritime traffic. However, ship navigation characteristics cannot be found in real-time with a ship Automatic Identification System (AIS) positioning system, and the clustering effect based on the density grid fixed-time-interval algorithm cannot resolve the shortcomings of real-time clustering. This study proposes an adaptive time interval clustering algorithm based on density grid (called DAC-Stream). This algorithm can perform adaptive time-interval clustering according to the size of the real-time ship trajectory data stream, so that a ship’s hot zone information can be found efficiently and in real-time. Experimental results show that the DAC-Stream algorithm improves the clustering effect and accelerates data processing compared with the fixed-time-interval clustering algorithm based on density grid (called DC-Stream).
- Article type
- Year
- Co-author
MapReduce is currently the most popular programming model for big data processing, and Hadoop is a well-known MapReduce implementation platform. However, Hadoop jobs suffer from imbalanced workloads during the reduce phase and inefficiently utilize the available computing and network resources. In some cases, these problems lead to serious performance degradation in MapReduce jobs. To resolve these problems, in this paper, we propose two algorithms, the Locality-Based Balanced Schedule (LBBS) and Overlapping-Based Resource Utilization (OBRU), that optimize the Locality-Enhanced Load Balance (LELB) and the Map, Local reduce, Shuffle, and final Reduce (MLSR) phases. The LBBS collects partition information from input data during the map phase and generates balanced schedule plans for the reduce phase. OBRU is responsible for using computing and network resources efficiently by overlapping the local reduce, shuffle, and final reduce phases. Experimental results show that the LBBS and OBRU algorithms yield significant improvements in load balancing. When LBBS and OBRU are applied, job performance increases by 15% from that of models using LELB and MLSR.