AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (1.5 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

A Scheduling Optimization Technique Based on Reuse in Spark to Defend Against APT Attack

Jianchao TangMing Xu( )Shaojing FuKai Huang
College of Computer, National University of Defense Technology, Changsha 410073, China.
Sate Key Laboratory of Cryptology, Beijing 100878, China.
Show Author Information

Abstract

Advanced Persistent Threat (APT) attack, an attack option in recent years, poses serious threats to the security of governments and enterprises data due to its advanced and persistent attacking characteristics. To address this issue, a security policy of big data analysis has been proposed based on the analysis of log data of servers and terminals in Spark. However, in practical applications, Spark cannot suitably analyze very huge amounts of log data. To address this problem, we propose a scheduling optimization technique based on the reuse of datasets to improve Spark performance. In this technique, we define and formulate the reuse degree of Directed Acyclic Graphs (DAGs) in Spark based on Resilient Distributed Datasets (RDDs). Then, we define a global optimization function to obtain the optimal DAG sequence, that is, the sequence with the least execution time. To implement the global optimization function, we further propose a novel cost optimization algorithm based on the traditional Genetic Algorithm (GA). Our experiments demonstrate that this scheduling optimization technique in Spark can greatly decrease the time overhead of analyzing log data for detecting APT attacks.

References

[1]
P. Chen, L. Desmet, and C. Huygens, A study on advanced persistent threats, in Proc. 15th IFIP TC 6/TC 11 Int. Conf. Communications and Multimedia Security, Aveiro, Portugal, 2014, pp. 63-72.
[2]
J. Vukalović and D. Delija, Advanced persistent threats detection and defense, in Proc. 2015 38th Int. Convention on Information and Communication Technology, Electronics and Microelectronics, Opatija, Croatia, 2015, pp. 1324-1330.
[3]
C. Tankard, Advanced persistent threats and how to monitor and deter them, Netw. Secur., vol. 2011, no. 8, pp. 16-19, 2011.
[4]
R. Brewer, Advanced persistent threats: Minimising the damage, Netw. Secur., vol. 2014, no. 4, pp. 5-9, 2014.
[5]
A. Moscaritolo, Transparency: Operation aurora, SC Magazine: For IT Security Professionals, vol. 21, no. 3, p. 14, 2010.
[6]
T. M. Chen and S. Abu-Nimeh, Lessons from stuxnet, Computer, vol. 44, no. 4, pp. 91-93, 2011.
[7]
Apache Spark™, Apache spark project, http://spark.apache.org/, 2018.
[8]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, Spark: Cluster computing with working sets, in Proc. 2nd USENIX Conf. Hot Topics in Cloud Computing, Boston, MA, USA, 2010, p. 10.
[9]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, in Proc. 9th USENIX Conf. Networked Systems Design and Implementation, San Jose, CA, USA, 2012, p. 2.
[10]
M. Zaharia, T. Das, H. Y. Li, S. Shenker, and I. Stoica, Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters, in Proc. 4th USENIX Conf. Hot Topics in Cloud Ccomputing, Boston, MA, USA, 2012, p. 10.
[11]
R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica, Shark: SQL and rich analytics at scale, in Proc. 2013 ACM SIGMOD Int. Conf. Management of Data, New York, NY, USA, 2013, pp. 13-24.
[12]
N. M. Weber, The relevance of research data sharing and reuse studies, Bull. Am. Soc. Inf. Sci. Technol, vol. 39, no. 6, pp. 23-26, 2013.
[13]
T. K. Sellis, Multiple-query optimization, ACM Trans. Database Syst., vol. 13, no. 1, pp. 23-52, 1988.
[14]
S. Dar, M. J. Franklin, B. T. Jónsson, D. Srivastava, and M. Tan, Semantic data caching and replacement, in Proc. 22nd Int. Conf. Very Large Data Bases, Bombay, India, 1996, pp. 330-341.
[15]
K. Dursun, C. Binnig, U. Cetintemel, and T. Kraska, Revisiting reuse in main memory database systems, in Proc. 2017 ACM Int. Conf. Management of Data, Chicago, IL, USA, 2017, pp. 1275-1289.
[16]
H. Karau, A. Konwinski, P. Wendell, and M. Zaharia, Learning Spark: Lightning-Fast Big Data Analysis. California, CA, USA: O’Reilly Media, 2015, pp. 26-30.
[17]
L. Wang, Directed acyclic graph, in Encyclopedia of Systems Biology, W. Dubitzky, O. Wolkenhauer, eds. New York, NY, USA: Springer, 2013, pp. 1105-1114.
[18]
Q. Ren, M. H. Dunham, and V. Kumar, Semantic caching and query processing, IEEE Trans. Knowl. Data Eng., vol. 15, no. 1, pp. 192-210, 2003.
[19]
Wikipedia, Schedule, https://en.wikipedia.org/wiki/Schedule, 2018.
[20]
R. Sakellariou and H. Zhao, A hybrid heuristic for DAG scheduling on heterogeneous systems, in Proc. 18th Int. Parallel and Distributed Processing Symp., Santa Fe, NM, USA, 2004, pp. 111-123.
[21]
M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, Job Scheduling for Multiuser Mapreduce Clusters. Berkeley, CA, USA: University of California, 2009.
[22]
U. Schwiegelshohn and R. Yahyapour, Fairness in parallel job scheduling, J. Schedul., vol. 3, no. 5, pp. 297-320, 2000.
[23]
D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn, Parallel job scheduling—A status report, in Proc. 10th Int. Workshop on Job Scheduling Strategies for Parallel Processing, New York, NY, USA, 2004, pp. 1-16.
[24]
T. S. Ferguson, Linear Programming: A concise introduction, https://www.math.ucla.edu/~tom/LP.pdf, 2000.
[25]
M. Dorigo and L. M. Gambardella, Ant colony system: A cooperative learning approach to the traveling salesman problem, IEEE Trans. Evol. Comput., vol. 1, no. 1, pp. 53-66, 1997.
[26]
D. B. Skalak, Prototype and feature selection by sampling and random mutation hill climbing algorithms, in Proc. 11th Int. Conf. Machine Learning, New Brunswick, NJ, USA, 1994, pp. 293-301.
[27]
S. Z. Selim and K. Alsultan, A simulated annealing algorithm for the clustering problem, Pattern Recognit., vol. 24, no. 10, pp. 1003-1008, 1991.
[28]
K. De Jong, Learning with genetic algorithms: An overview, Mach. Learn., vol. 3, no. 23, pp. 121-138, 1988.
[29]
U. C. Berkeley AMPLab, Big data benchmark, https://amplabcsberkeleyedu/benchmark, 2014.
Tsinghua Science and Technology
Pages 550-560
Cite this article:
Tang J, Xu M, Fu S, et al. A Scheduling Optimization Technique Based on Reuse in Spark to Defend Against APT Attack. Tsinghua Science and Technology, 2018, 23(5): 550-560. https://doi.org/10.26599/TST.2018.9010022

720

Views

30

Downloads

9

Crossref

N/A

Web of Science

15

Scopus

0

CSCD

Altmetrics

Received: 22 September 2017
Accepted: 29 September 2017
Published: 17 September 2018
© The author(s) 2018
Return