Network traffic anomaly detection is limited by the lack of annotation information in the traffic. This paper presents an unsupervised anomaly detection method based on score iterations that overcomes this limitation. An autoencoder based anomaly score iteration process was designed to learn generic anomaly features to determine an initial anomaly score. A deep ordinal regression model based anomaly score iteration process was then designed to learn discriminative anomaly features to further improve the anomaly score accuracy. Deep models, multi-view features and ensemble learning are also used to improve the detection accuracy. Tests on several datasets show that this method has significant advantages over other methods in the absence of annotation information and can be effectively applied to network traffic anomaly detection.
- Article type
- Year
- Co-author
Recently, local differential privacy (LDP) has been used as the de facto standard for data sharing and analyzing with high-level privacy guarantees. Existing LDP-based mechanisms mainly focus on learning statistical information about the entire population from sensitive data. For the first time in the literature, we use LDP for distance estimation between distributed data to support more complicated data analysis. Specifically, we propose PrivBV—a locally differentially private bit vector mechanism with a distance-aware property in the anonymized space. We also present an optimization strategy for reducing privacy leakage in the high-dimensional space. The distance-aware property of PrivBV brings new insights into complicated data analysis in distributed environments. As study cases, we show the feasibility of applying PrivBV to privacy-preserving record linkage and non-interactive clustering. Theoretical analysis and experimental results demonstrate the effectiveness of the proposed scheme.
Information diffusion is one of the most important issues in social network analysis. Unlike most existing works, which either rely on network topology or node profiles, this study focuses on the diffusion itself, i.e., the recorded propagation histories. These histories are the evidence of diffusion and can be used to explain to users what happened in their networks. However, these histories can quickly grow in size and complexity, limiting their capacity to be intuitively understood. To reduce this information overload, in this paper we present the problem of propagation history ranking. The goal is to rank participant edges/nodes by their contribution to the diffusion. We first discuss and adapt a causal measure, Difference of Causal Effects (DCE), as the ranking criterion. Then, to avoid the complex calculation of DCE, we propose two integrated ranking strategies by adopting two indicators. One is responsibility, which captures the necessity aspect of causal effects. We further give an approximate algorithm, which could guarantee a feasible solution, for this indicator. The other is capability, which captures the sufficiency aspect of causal effects. Finally, promising experimental results are presented to verify the feasibility of the proposed ranking strategies.
Performance predictions for database queries allow service providers to determine what resources are needed to ensure their performance. Cost-based or rule-based approaches have been proposed to optimize database query execution plans. However, Virtual Machine (VM)-based database services have little or no sharing of resources or interactions between applications hosted on shared infrastructures. Neither providers nor users have the right combination of visibility/access/expertise to perform proper tuning and provisioning. This paper presents a performance prediction model for query execution time estimates based on the query complexity for various data sizes. The user query execution time is a combination of five basic operator complexities: