AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (15.3 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Trajectory Big Data Processing Based on Frequent Activity

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
Show Author Information

Abstract

With the rapid development and wide use of Global Positioning System in technology tools, such as smart phones and touch pads, many people share their personal experience through their trajectories while visiting places of interest. Therefore, trajectory query processing has emerged in recent years to help users find their best trajectories. However, with the huge amount of trajectory points and text descriptions, such as the activities practiced by users at these points, organizing these data in the index becomes tedious. Therefore, the parallel method becomes indispensable. In this paper, we have investigated the problem of distributed trajectory query processing based on the distance and frequent activities. The query is specified by start and final points in the trajectory, the distance threshold, and a set of frequent activities involved in the point of interest of the trajectory. As a result, the query returns the shortest trajectory including the most frequent activities with high support and high confidence. To simplify the query processing, we have implemented the Distributed Mining Trajectory R-Tree index (DMTR-Tree). For this method, we initially managed the large trajectory dataset in distributed R-Tree indexes. Then, for each index, we applied the frequent itemset Apriori algorithm for each point to select the frequent activity set. For the faster computation of the above algorithms, we utilized the cluster computing framework of Apache Spark with MapReduce as the programing model. The experimental results show that the DMTR-Tree index and the query-processing algorithm are efficient and can achieve the scalability.

References

[1]
Aung H. H., Guo L., and Tan K. L., Mining sub-trajectory cliques to find frequent routes, in Proc. 13th Int. Symp. Advances in Spatial and Temporal Databases, Munich, Germany, 2013, pp. 92-109.
[2]
Zheng K., Shang S., Yuan N. J., and Yang Y., Towards efficient search for activity trajectories, in Proc. 29th Int. Conf. Data Engineering, Brisbane, Australia, 2013, pp. 230-241.
[3]
Zhang C., Han J. W., Shou L. D., Lu J. J., and La Porta T., Splitter: Mining fine-grained sequential patterns in semantic trajectories, Proc. VLDB Endow., vol. 7, no. 9, pp. 769-780, 2014.
[4]
Chen W., Zhao L., Xu J. J., Liu G. F., Zheng K., and Zhou X. F., Trip oriented search on activity trajectory, J. Comput. Sci. Technol., vol. 30, no. 4, pp. 745-761, 2015.
[5]
Agrawal R. and Srikant R., Fast algorithms for mining association rules in large databases, in Proc. 20th Int. Conf. Very Large Data Bases, Santiago de Chile, Chile, 1994, pp. 487-499.
[6]
Bayardo R. J. Jr., Efficiently mining long patterns from databases, ACM SIGMOD Rec., vol. 27, no. 2, pp. 85-93, 1998.
[7]
Zaki M. J., Parthasarathy S., Ogihara M., and Li W., Parallel algorithms for discovery of association rules, Data Min. Knowl. Disc., vol. 1, no. 4, pp. 343-373, 1997.
[8]
Qiu H. J., Gu R., Yuan C. F., and Huang Y. H., YAFIM: A parallel frequent itemset mining algorithm with spark, in Proc. 28th Int. Parallel & Distributed Processing Symp. Workshops, Phoenix, AZ, USA, 2014.
[9]
Guttman A., R-trees: A dynamic index structure for spatial searching, in Proc. 1984 ACM SIGMOD Int. Conf. Management of Data, Boston, MA, USA, 1984, pp. 47-57.
[10]
Dean J. and Ghemawat S., MapReduce: Simplified data processing on large clusters, in Proc. 6th Conf. Symp. Operating Systems Design & Implementation, San Francisco, CA, USA, 2004.
[11]
Eldawy A., Alarabi L., and Mokbel M. F., Spatial partitioning techniques in Spatial Hadoop, in Proc. 41st Int. Conf. Very Large Data Bases, Kohala Coast, HI, USA, 2015, pp. 1602-1605.
[12]
Yang H. C., Dasdan A., Hsiao R. L., and Parker D. S., Map-reduce-merge: Simplified relational data processing on large clusters, in Proc. ACM SIGMOD Int. Conf. Management of Data, Beijing, China, 2007, pp. 1029-1040.
[13]
Wang H. Z. and Belhassena A., Parallel trajectory search based on distributed index, Inf. Sci., vols. 388&389, pp. 62-83, 2017.
[14]
Agrawal R. and Srikant R., Mining sequential patterns, in Proc. 11th Int. Conf. Data Engineering, Taipei, China, 1995.
[15]
Morzy M., Prediction of moving object location based on frequent trajectories, in Proc. 21st Int. Conf. Computer and Information Sciences, Istanbul, Turkey, 2006, pp. 583-592.
[16]
Morzy M., Mining frequent trajectories of moving objects for location prediction, in Proc. 5th Int. Conf. Machine Learning and Data Mining in Pattern Recognition, Leipzig, Germany, 2007, pp. 18-20.
[17]
Masciari E., Shi G., and Zaniolo C., Sequential pattern mining from trajectory data, in Proc. 17th Int. Database Engineering Applications Symp., Barcelona, Spain, 2013, pp. 162-167.
[18]
Monreale A., Pinelli F., Trasarti R., and Giannotti F., WhereNext: A location predictor on trajectory pattern mining, in Proc. 15th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Paris, France, 2009.
[19]
Li N., Zeng L., He Q., and Shi Z. Z., Parallel implementation of apriori algorithm based on MapReduce, in Proc. 13th ACIS Int. Conf. Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Kyoto, Japan, 2012, pp. 236-241.
[20]
Lin M. Y., Lee P. Y., and Hsueh S. C., Apriori-based frequent itemset mining algorithms on MapReduce, in Proc. 16th Int. Conf. Ubiquitous Information Management and Communication, Kuala Lumpur, Malaysia, 2012.
[21]
Guo J. and Ren Y. G., Research on improved a priori algorithm based on coding and MapReduce, in Proc. 10th Web Information System and Application Conf., Yangzhou, China, 2013, pp. 294-299.
[22]
Zaki M. J., Parthasarathy S., Ogihara M., and Li W., New algorithms for fast discovery of association rules, in Proc. 3rd Int. Conf. Knowledge Discovery and Data Mining, Newport Beach, CA, USA, 1997.
[23]
Li N., Zang L., He Q., and Shi Z. Z., Parallel implementation of apriori algorithm based on MapReduce, Int. J. Netw. Distrib. Comput., vol. 1, no. 2, pp. 89-96, 2013.
[24]
Tong W., Rudin C., Wagner D., and Sevieri R., Learning to detect patterns of crime, in Proc. European Conf. Machine Learning and Knowledge Discovery in Databases, Prague, Czech Republic, 2013, pp. 515-530.
[25]
Zaharia M., Chowdhury M., Franklin M. J., Shenker S., and Stoica I., Spark: Cluster computing with working sets, in Proc. 2nd USENIX Conf. Hot Topics in Cloud Computing, Boston, MA, USA, 2010.
[26]
Rathee S., Kaul M., and Kashyap A., R-Apriori: An efficient apriori based algorithm on spark, in Proc. 8th Workshop on Ph.D. Workshop in Information and Knowledge Management, Melbourne, Australia, 2015, pp. 27-34.
[27]
Beckmann N., Kriegel H. P., Schneider R., and Seeger B., The R-tree: An efficient and robust access method for points and rectangles, in Proc. 1990 ACM SIGMOD Int. Conf. Management of Data, Atlantic City, NJ, USA, 1990, pp. 322-331.
[28]
Hariharan R., Hore B., Li C., and Mehrotra S., Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems, in Proc. 19th Int. Conf. Scientific and Statistical Database Management, Banff, Canada, 2007.
[29]
Zhang D. X., Chee Y. M., Mondal A., Tung A. K. H., and Kitsuregawa M., Keyword search in spatial databases: Towards searching by document, in Proc. 25th Int. Conf. Data Engineering, Shanghai, China, 2009, pp. 688-699.
[30]
Chen W., Zhao L., Xu J. J., Zheng K., and Zhou X. F., Ranking based activity trajectory search, in Proc. 15th Int. Conf. Web Information Systems Engineering, Thessaloniki, Greece, 2014, pp. 170-185.
[31]
Du Mouza C., Litwin W., and Rigaux P., SD-Rtree: A scalable distributed R-tree, in Proc. 23rd Int. Conf. Data Engineering, Istanbul, Turkey, 2007, pp. 296-305.
[32]
Eldawy A. and Mokbel M. F., A demonstration of Spatial Hadoop: An efficient MapReduce framework for spatial data, Proc. VLDB Endow, vol. 6, no. 12, pp. 1230-1233, 2013.
[33]
Wang L., Chen B., and Liu Y. H., Distributed storage and index of vector spatial data based on HBase, in Proc. 21st Int. Conf. Geoinformatics, Kaifeng, China, 2013, pp. 1-5.
[34]
Yu J., Wu J. X., and Sarwat M., GeoSpark: A cluster computing framework for processing large-scale spatial data, in Proc. 23rd SIGSPATIAL Int. Conf. Advances in Geographic Information Systems, Seattle, WA, USA, 2015.
[35]
Lee J. G., Han J. W., and Whang K. Y., Trajectory clustering: A partition-and-group framework, in Proc. 2007 ACM SIGMOD Int. Conf. Management of Data, Beijing, China, 2007, pp. 593-604.
[36]
Han J. W., Pei J., and Yin Y. W., Mining frequent patterns without candidate generation, in Proc. 2000 ACM SIGMOD Int. Conf. Management of Data, Dallas, TX, USA, 2000, pp. 1-12.
[37]
Ester M., Kriegel H. P., Sander J., and Xu X. W., A density-based algorithm for discovering clusters in large spatial databases with noise, in Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining, Portland, OR, USA, 1996, pp. 226-231.
Tsinghua Science and Technology
Pages 317-332
Cite this article:
Belhassena A, Wang H. Trajectory Big Data Processing Based on Frequent Activity. Tsinghua Science and Technology, 2019, 24(3): 317-332. https://doi.org/10.26599/TST.2018.9010087

661

Views

13

Downloads

12

Crossref

N/A

Web of Science

16

Scopus

5

CSCD

Altmetrics

Received: 22 November 2017
Accepted: 22 February 2018
Published: 24 January 2019
© The author(s) 2019
Return