Article Link
Collect
Submit Manuscript
Show Outline
Outline
Abstract
Keywords
Electronic Supplementary Material
References
Show full outline
Hide outline
Regular Paper

A New Approach to Multivariate Network Traffic Analysis

Department of Computer Science, Texas A&M University, Commerce 75428, U.S.A.
Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, U.S.A.

A preliminary version of the paper was published in the Proceedings of ICCCN 2017.

Show Author Information

Abstract

Network traffic analysis is one of the core functions in network monitoring for effective network operations and management. While online traffic analysis has been widely studied, it is still intensively challenging due to several reasons. One of the primary challenges is the heavy volume of traffic to analyze within a finite amount of time due to the increasing network bandwidth. Another important challenge for effective traffic analysis is to support multivariate functions of traffic variables to help administrators identify unexpected network events intuitively. To this end, we propose a new approach with the multivariate analysis that offers a high-level summary of the online network traffic. With this approach, the current state of the network will display patterns compiled from a set of traffic variables, and the detection problems in network monitoring (e.g., change detection and anomaly detection) can be reduced to a pattern identification and classification problem. In this paper, we introduce our preliminary work with clustered patterns for online, multivariate network traffic analysis with the challenges and limitations we observed. We then present a grid-based model that is designed to overcome the limitations of the clustered pattern-based technique. We will discuss the potential of the new model with respect to the technical challenges including streaming-based computation and robustness to outliers.

Electronic Supplementary Material

Download File(s)
jcst-34-2-388-Highlights.pdf (769.4 KB)

References

[1]
Liu D P, Zhao Y J, Xu H W Sun Y Q, Pei D, Luo J, Jing X W, Feng M. Opprentice: Towards practical and automatic anomaly detection through machine learning. In Proc. the 2015 ACM Internet Measurement Conference, October 2015, pp.211-224.
[2]
Krishnamurthy B, Sen S Zhang Y, Chen Y. Sketch-based change detection: Methods, evaluation, and applications. In Proc. the 3rd ACM SIGCOMM Conference on Internet Measurement, October 2003, pp.234-247.
[3]
Choi J, Hu K J, Sim A. Relational dynamic Bayesian networks with locally exchangeable measures. Technical Report LBNL-6341E, Lawrence Berkeley National Laboratory, 2013. https://www.osti.gov/servlets/purl/1165582, November 2018.
[4]
Yu M L, Jose L, Miao R. Software defined traffic measurement with OpenSketch. In Proc. the 10th USENIX Conference on Networked Systems Design and Implementation, April 2013, pp.29-42.
[5]
Cho K, Fukuda K, Esaki H, Kato A. Observing slow crustal movement in residential user traffic. In Proc. the 2008 ACM Conference on Emerging Network Experiment and Technology, December 2008, Article No. 12.
[6]
Schweller R, Gupta A, Parsons E, Chen Y. Reversible sketches for efficient and accurate change detection over network data streams. In Proc. the 4th ACM SIGCOMM Conference on Internet Measurement, Oct. 2004, pp.207-212.
[7]
Liu Z X, Manousis A, Vorsanger G, Sekar V, Braverman V. One sketch to rule them all: Rethinking network flow monitoring with UnivMon. In Proc. the 2016 ACM SIGCOMM Conference, August 2016, pp.101-114.
[8]
Kim J, Sim A. A new approach to online, multivariate network traffic analysis. In Proc. the 26th International Conference on Computer Communications and Networks, July 2017.
[9]
Manku G S, Motwani R. Approximate frequency counts over data streams. In Proc. the 28th International Conference on Very Large Data Bases, August 2002, pp.346-357.
[10]
Das S, Antony S, Agrawal D, Abbadi A E. CoTS: A scalable framework for parallelizing frequency counting over data streams. In Proc. the 25th IEEE International Conference on Data Engineering, March 2009, pp.1323-1326.
[11]

Das S, Antony S, Agrawal D, Abbadi A E. Thread cooperation in multicore architectures for frequency counting over multiple data streams. Proceedings of the VLDB Endowment, 2009, 2(1): 217-228.

[12]
Guha S, Koudas N, Shim K. Data-streams and histograms. In Proc. the 33rd Annual ACM Symposium on Theory of Computing, July 2001, pp.471-475.
[13]
Aggarwal C, Han J, Wang J, Yu P. A framework for clustering evolving data streams. In Proc. the 29th International Conference on Very Large Data Bases, September 2003, pp.81-92.
[14]
Domingos P, Hulten G. A general method for scaling up machine learning algorithms and its application to clustering. In Proc. the 8th International Conference on Machine Learning, June 2001, pp.106-113.
[15]
Guha S, Mishra N, Motwani R, O’Callaghan L. Clustering data streams. In Proc. the 41st Annual Symposium on Foundations of Computer Science, November 2000, pp.356-366.
[16]

Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(3): 515-528.

[17]
Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. In Proc. the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, January 2002, pp.635-644.
[18]
Matias Y, Vitter J S, Wang M. Wavelet-based histograms for selectivity estimation. In Proc. the 1998 ACM SIGMOD International Conference on Management of Data, June 1998, pp.448-459.
[19]
Vitter J S, Wang M. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proc. the 1999 ACM SIGMOD International Conference on Management of Data, June 1999, pp.193-204.
[20]
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S. Locally adaptive dimensionality reduction for indexing large time series databases. In Proc. the 2001 ACM SIGMOD International Conference on Management of Data, May 2001, pp.151-162.
[21]
Papadimitriou S, Sun J, Faloutsos C. Dimensionality reduction and forecasting on streams. In Data Streams, Models and Algorithms, Aggarwal C C (ed.), Springer, 2007, pp.261-288.
[22]

Lee S, Kim H, Barman D, Lee S, Kim C K, Kwon T, Choi Y. NeTraMark: A network traffic classification benchmark. SIGCOMM Comput. Commun. Rev., 2011, 41(1): 22-30.

[23]

Karagiannis T, Papagiannaki K, Faloutsos M. BLINC: Multilevel traffic classification in the dark. SIGCOMM Comput. Commun. Rev., 2005, 35(4): 229-240.

[24]
Iliofotou M, Pappu P, Faloutsos M, Mitzenmacher M, Singh S, Varghese G. Network monitoring using traffic dispersion graphs. In Proc. the 7th ACM SIGCOMM Conference on Internet Measurement, October 2007, pp.315-320.
[25]
Kim J, Sim A, Suh S, Kim I. An approach to online network monitoring using clustered patterns. In Proc. the 2007 International Conference on Computing, Networking and Communication, January 2017, pp.656-661.
[26]

Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S. Scalable k-means++. Proceedings of the VLDB Endowment, 2012, 5(7): 622-633.

[27]
Mills-Tettey A, Stentz A, Dias S B. The dynamic Hungarian algorithm for the assignment problem with changing costs. Technical Report, Carnegie Mellon University, 2007. https://www.ri.cmu.edu/pub_files/pub4/mills_tettey_g_ayorkor_2007_3/mills_tettey_g_ayorkor_2007_3.pdf, November 2018.
[28]
Dusi M, Este A, Gringoli F, Salgarelli L. Using GMM and SVM-based techniques for the classification of SSH-encrypted traffic. In Proc. IEEE International Conference on Communications, June 2009.
[29]

Rgringoli F, Salgarelli L, Dusa M, Cascarano N, Risso F, Claffy K. GT: Picking up the truth from the ground for internet traffic. ACM SIGCOMM Computer Communication Review, 2009 39(5): 13-18.

[30]
Fontugne R, Borgnat P, Abry P, Fukuda K. MAWILab: Combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking. In Proc. the 2010 ACM Conference on Emerging Networking Experiments and Technology, November 2010, Article No. 8.
[31]
Estan C, Keys K, Moore D, Varghese G. Building a better NetFlow. In Proc. the 2004 ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, August 2004, pp.245-256.
[32]
Wang M, Li B C, Li Z P. sFlow: Towards resource-efficient and agile service federation in service overlay networks. In Proc. the 24th International Conference on Distributed Computing Systems, March 2004, pp.628-635.
[33]
Schikuta E. Grid-clustering: A fast hierarchical clustering method for very large data sets. Technical Report, Rice University, 1993. https://www.researchgate.net/publication/210242098_Grid-Clustering_An_efficient_hierarchical_Clustering_method_for_very_large_data_sets, November 2018.
[34]
Kim J, Yoo W, Sim A, Suh S, Kim I. A lightweight network anomaly detection technique. In Proc. the International Workshop on Computing, Networking and Communications, January 2017, pp.896-900.
[35]
Tavallaee M, Bagheri E, Lu W, Ghorbani A A. A detailed analysis of the KDD CUP 99 data set. In Proc. the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, July 2009, Article No. 38.
[36]
Glazer A, Lindenbaum M, Markovitch S. q-OCSVM: A q-quantile estimator for high-dimensional distributions. In Proc. the 27th Annual Conference on Neural Information Processing Systems, December 2013, pp.503-511.
[37]

Solomon J, de Goes F, Peyré G, Cuturi M, Butscher A, Nguyen A, Du T, Guibas L. Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans. Graph. 2015, 34(4): Article No. 66.

[38]
Seguy V, Cuturi M. Principal geodesic analysis for probability measures under the optimal transport metric. In Proc. the 2015 Annual Conference on Neural Information Processing Systems, December 2015, pp.3312-3320.
[39]

Mellia M, Cigno R L, Neri F. Measuring IP and TCP behavior on edge nodes with Tstat. Comput. Netw., 2005, 47(1): 1-21.

Journal of Computer Science and Technology
Pages 388-402
Cite this article:
Kim J, Sim A. A New Approach to Multivariate Network Traffic Analysis. Journal of Computer Science and Technology, 2019, 34(2): 388-402. https://doi.org/10.1007/s11390-019-1915-y
Metrics & Citations  
Article History
Copyright
Return