Advanced Persistent Threat (APT) attack, an attack option in recent years, poses serious threats to the security of governments and enterprises data due to its advanced and persistent attacking characteristics. To address this issue, a security policy of big data analysis has been proposed based on the analysis of log data of servers and terminals in Spark. However, in practical applications, Spark cannot suitably analyze very huge amounts of log data. To address this problem, we propose a scheduling optimization technique based on the reuse of datasets to improve Spark performance. In this technique, we define and formulate the reuse degree of Directed Acyclic Graphs (DAGs) in Spark based on Resilient Distributed Datasets (RDDs). Then, we define a global optimization function to obtain the optimal DAG sequence, that is, the sequence with the least execution time. To implement the global optimization function, we further propose a novel cost optimization algorithm based on the traditional Genetic Algorithm (GA). Our experiments demonstrate that this scheduling optimization technique in Spark can greatly decrease the time overhead of analyzing log data for detecting APT attacks.
- Article type
- Year
- Co-author
Mobile crowd sensing is an innovative paradigm which leverages the crowd, i.e., a large group of people with their mobile devices, to sense various information in the physical world. With the help of sensed information, many tasks can be fulfilled in an efficient manner, such as environment monitoring, traffic prediction, and indoor localization. Task and participant matching is an important issue in mobile crowd sensing, because it determines the quality and efficiency of a mobile crowd sensing task. Hence, numerous matching strategies have been proposed in recent research work. This survey aims to provide an up-to-date view on this topic. We propose a research framework for the matching problem in this paper, including participant model, task model, and solution design. The participant model is made up of three kinds of participant characters, i.e., attributes, requirements, and supplements. The task models are separated according to application backgrounds and objective functions. Offline and online solutions in recent literatures are both discussed. Some open issues are introduced, including matching strategy for heterogeneous tasks, context-aware matching, online strategy, and leveraging historical data to finish new tasks.