Efficient Feature Extraction Using Apache Spark for Network Behavior Anomaly Detection

Xiaoming Ye; Xingshu Chen; Dunhu Liu; Wenxian Wang; Li Yang; Gang Liang; Guolin Shao

doi:10.26599/TST.2018.9010021

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (33.4 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Efficient Feature Extraction Using Apache Spark for Network Behavior Anomaly Detection

Xiaoming Ye, Xingshu Chen(

), Dunhu Liu, Wenxian Wang, Li Yang, Gang Liang, Guolin Shao

School of Cybersecurity, Chengdu University of Information Technology, Chengdu

610225

College of Cybersecurity, Sichuan University, Chengdu 610065, China.

School of Management, Chengdu University of Information Technology, Chengdu 610103, China.

College of Compute Science, Sichuan University, Chengdu 610065, China.

Show Author Information

Abstract

Extracting and analyzing network traffic feature is fundamental in the design and implementation of network behavior anomaly detection methods. The traditional network traffic feature method focuses on the statistical features of traffic volume. However, this approach is not sufficient to reflect the communication pattern features. A different approach is required to detect anomalous behaviors that do not exhibit traffic volume changes, such as low-intensity anomalous behaviors caused by Denial of Service/Distributed Denial of Service (DoS/DDoS) attacks, Internet worms and scanning, and BotNets. We propose an efficient traffic feature extraction architecture based on our proposed approach, which combines the benefit of traffic volume features and network communication pattern features. This method can detect low-intensity anomalous network behaviors and conventional traffic volume anomalies. We implemented our approach on Spark Streaming and validated our feature set using labelled real-world dataset collected from the Sichuan University campus network. Our results demonstrate that the traffic feature extraction approach is efficient in detecting both traffic variations and communication structure changes. Based on our evaluation of the MIT-DRAPA dataset, the same detection approach utilizes traffic volume features with detection precision of 82.3% and communication pattern features with detection precision of 89.9%. Our proposed feature set improves precision by 94%.

Keywords

feature extraction graph theory network behavior anomaly detection Apache Spark

References

[1]

, F.

Wang

, and L.

, Behavior analysis of internet traffic via bipartite graphs and one-mode projections, IEEE/ACM Trans. Netw., vol. 22, no. 3, pp. 931-942, 2014.

Crossref Google Scholar

[2]

Sperotto

, R.

Sadre

, P. T.

Boer

, and A.

Pras

, Hidden Markov model modeling of SSH brute-force attacks, in Proc. 20th IFIP/IEEE Int. Workshop on Distributed Systems: Operations and Management: Integrated Management of Systems Services Processes and People in IT, Venice, Italy, 2009, pp. 164-176.

Crossref

[3]

Huang

, Z. W.

, and B.

Liu

, Network anomaly detection based on statistical approach and time series analysis, in Proc. 23th Int. Conf. Advanced Information Networking and Applications Workshops, Bradford, UK, 2009, pp. 205-211.

[4]

Andrysiak

, Ł

Saganowski

, M.

Choraś

, and R.

Kozik

, Network traffic prediction and anomaly detection based on ARFIMA model, in Proc. Int. Joint Conf. SOCO’14-CISIS’14-ICEUTE’14, Bilbao, Spain, 2014, pp. 545-554.

Crossref

[5]

M. M.

Ding

and H.

Tian

, PCA-based network traffic anomaly detection, Tsinghua Sci. Technol., vol. 21, no. 5, pp. 500-509, 2016.

Crossref Google Scholar

[6]

X. M.

, X. S.

Chen

, H. Z.

Wang

, X. M.

Zeng

, G. L.

Shao

, X. Y.

Yin

, and C.

, An anomalous behavior detection model in Cloud Computing, Tsinghua Sci. Technol., vol. 21, no. 3, pp. 322-332, 2016.

Crossref Google Scholar

[7]

Willinger

, M. S.

Taqqu

, R.

Sherman

, and D. V.

Wilson

, Self-similarity through high-variability: Statistical analysis of Ethernet LAN traffic at the source level, IEEE/ACM Trans. Netw., vol. 5, no. 1, pp. 71-86, 1997.

Crossref Google Scholar

[8]

Babaie

, S.

Chawla

, and S.

Ardon

, Network traffic decomposition for anomaly detection, Computer Science, vol. 96, no. 2, pp. 201-212, 2014.

Google Scholar

[9]

Winter

, H.

Lampesberger

, M.

Zeilinger

, and E.

Hermann

, On detecting abrupt changes in network entropy time series, in Proc. 12th IFIP TC 6/TC 11 Int. Conf. Communications and Multimedia Security, Ghent, Belgium, 2011, pp. 194-205.

Crossref

[10]

W. E.

Leland

, M. S.

Taqqu

, W.

Willinger

, and D. V.

Wilson

, On the self-similar nature of Ethernet traffic (extended version), IEEE/ACM Trans. Netw., vol. 2, no. 1, pp. 1-15, 1994.

Crossref Google Scholar

[11]

Iliofotou

, M.

Faloutsos

, and M.

Mitzenmacher

, Exploiting dynamicity in graph-based traffic analysis: Techniques and applications, in Proc. 5th Int. Conf. Emerging Networking Experiments and Technologies, Rome, Italy, 2009, pp. 241-252.

Crossref

[12]

Akoglu

, H. H.

Tong

, and D.

Koutra

, Graph based anomaly detection and description: A survey, Data Min. Knowl. Discov., vol. 29, no. 3, pp. 626-688, 2015.

Crossref Google Scholar

[13]

D. Q.

, T.

Jeong

, H. E.

Roman

, and J. W. K.

Hong

, Traffic dispersion graph based anomaly detection, in Proc. 2nd Symp. on Information and Communication Technology, Hanoi, Vietnam, 2011, pp. 36-41.

Crossref

[14]

M. S.

Rahman

, T. K.

Huang

, H. V.

Madhyastha

, and M.

Faloutsos

, Efficient and scalable socware detection in online social networks, in Proc. 21st USENIX Conf. Security Symp., Bellevue, WA, USA, 2012, p. 32.

[15]

Khurana

, S.

Parthasarathy

, and D.

Turaga

, Graph–based exploration of non-graph datasets, Proc. VLDB Endow., vol. 9, no. 13, pp. 1557-1560, 2016.

Crossref Google Scholar

[16]

C. R.

Harshaw

, R. A.

Bridges

, M. D.

Iannacone

, J. W.

Reed

, and J. R.

Goodall

, GraphPrints: Towards a graph analytic method for network anomaly detection, in Proc. 11th Annu. Cyber and Information Security Research Conf., Oak Ridge, TN, USA, 2016, pp. 1-4.

Crossref

[17]

François

, S. N.

Wang

, R. D.

State

, and T.

Engel

, BotTrack: Tracking botnets using NetFlow and PageRank, in Proc. 10th Int. IFIP TC 6 Conf. Networking, Valencia, Spain, 2011, pp. 1-14.

Crossref

[18]

Ding

, N.

Katenka

, P.

Barford

, E.

Kolaczyk

, and M.

Crovella

, Intrusion as (anti)social communication: Characterization and detection, in Proc. 18th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Beijing, China, 2012, pp. 886-894.

Crossref

[19]

Weigert

, M. A.

Hiltunen

, and C.

Fetzer

, Community-based analysis of netflow for early detection of security incidents, in Proc. 25th Int. Conf. Large Installation System Administration, Boston, MA, USA, 2011, p. 20.

[20]

Ishibashi

, T.

Kondoh

, S.

Harada

, T.

Mori

, R.

Kawahara

, and S.

Asano

, Detecting anomalous traffic using communication graphs, in Telecommunications: The Infrastructure for the 21st Century, Vienna, Austria, 2010, pp. 1-6.

[21]

Z. M.

Chen

, K. Y.

Chai

, S. L. F.

, and C. T.

Lau

, Combining MIC feature selection and feature-based MSPCA for network traffic anomaly detection, in Proc. 3rd Int. Conf. on Digital Information Processing, Data Mining, and Wireless Communications, Moscow, Russia, 2016, pp. 176-181.

Crossref

[22]

Tan

, X. S.

Chen

, M.

, and K.

Zhu

, A novel internet traffic identification approach using wavelet packet decomposition and neural network, J. Cent. South Univ., vol. 19, no. 8, pp. 2218-2230, 2012.

Crossref Google Scholar

[23]

S. R.

Kundu

, S.

Pal

, K.

Basu

, and S. K.

Das

, Fast classification and estimation of Internet traffic flows, in Proc. 8th Int. Conf. Passive and Active Network Measurement, Louvainla-Neuve, Belgium, 2007, pp. 155-164.

Crossref

[24]

Barford

and D.

Plonka

, Characteristics of network traffic flow anomalies, in Proc. 1st ACM SIGCOMM Workshop on Internet Measurement, San Francisco, CA, USA, 2001, pp. 69-73.

Crossref

[25]

Bunke

, P. J.

Dickinson

, M.

Kraetzl

, and W. D.

Wallis

, A graph-theoretic approach to enterprise network dynamics, Progress in Computer Science and Applied Logic, vol. 24, pp. 63-78, 2007.

Google Scholar

[26]

Iliofotou

, H. C.

Kim

, M.

Faloutsos

, M.

Mitzenmacher

, P.

Pappu

, and G.

Varghese

, Graption: A graph–based P2P traffic classification framework for the internet backbone, Comput. Netw., vol. 55, no. 8, pp. 1909-1920, 2011.

Crossref Google Scholar

[27]

Chaparro

and C.

Eberle

, Detecting anomalies in mobile telecommunication networks using a graph based approach, in Proc. 28th Int. Florida Artificial Intelligence Research Society Conf., Hollywood, FL, USA, 2015, pp. 410-415.

[28]

Sanfeliu

and K. S.

, A distance measure between attributed relational graphs for pattern recognition, IEEE Trans. Syst. Man. Cybern., vol. 13, no. 3, pp. 353-362, 1983.

Crossref Google Scholar

[29]

Mookiah

, W.

Eberle

, and L.

Holder

, Detecting suspicious behavior using a graph-based approach, in Proc IEEE Conf. Visual Analytics Science and Technology, Paris, France, 2014, pp. 357-358.

Crossref

[30]

Lin

, E.

Keogh

, S.

Lonardi

, and B.

Chiu

, A symbolic representation of time series, with implications for streaming algorithms, in Proc. 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, CA, USA, 2003, pp. 2-11.

Crossref

[31]

Keogh

, K.

Chakrabarti

, M.

Pazzani

, and S.

Mehrotra

, Dimensionality reduction for fast similarity search in large time series databases, Knowl. Inf. Syst., vol. 3, no. 3, pp. 263-286, 2001.

Crossref Google Scholar

[32]

Karagiannis

, M.

Molle

, and M.

Faloutsos

, Longrange dependence ten years of Internet traffic modeling, IEEE Internet Comput., vol. 8, no. 5, pp. 57-64, 2004.

Crossref Google Scholar

[33]

S I.

Tadaki

, Long-term power-law fluctuation in Internet traffic, J. Phys. Soc. Jpn., vol. 76, no. 4, p. 044001, 2007.

Crossref Google Scholar

[34]

Samorodnitsky

, Long range dependence, Found. Trends Stoch. Syst., vol. 1, no. 3, pp. 163-257, 2007.

Crossref Google Scholar

[35]

M. V.

Mahoney

and P. K.

Chan

, An analysis of the 1999 DARPA/Lincoln laboratory evaluation data for network anomaly detection, Recent Advances in Intrusion Detection, vol. 1, no. 1, pp. 220-237, 2003.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 23 Issue 5,
October 2018

Pages 561-573

DOI: 10.26599/TST.2018.9010021

Cite this article:

Ye X, Chen X, Liu D, et al. Efficient Feature Extraction Using Apache Spark for Network Behavior Anomaly Detection. Tsinghua Science and Technology, 2018, 23(5): 561-573. https://doi.org/10.26599/TST.2018.9010021

793

Views

Downloads

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 24 September 2017

Accepted: 29 September 2017

Published: 17 September 2018