Scholar - SciOpen

Network traffic anomaly detection is limited by the lack of annotation information in the traffic. This paper presents an unsupervised anomaly detection method based on score iterations that overcomes this limitation. An autoencoder based anomaly score iteration process was designed to learn generic anomaly features to determine an initial anomaly score. A deep ordinal regression model based anomaly score iteration process was then designed to learn discriminative anomaly features to further improve the anomaly score accuracy. Deep models, multi-view features and ensemble learning are also used to improve the detection accuracy. Tests on several datasets show that this method has significant advantages over other methods in the absence of annotation information and can be effectively applied to network traffic anomaly detection.

Open Access Issue

PrivBV: Distance-Aware Encoding for Distributed Data with Local Differential Privacy

Lin Sun, Guolou Ping, Xiaojun Ye

Tsinghua Science and Technology 2022, 27(2): 412-421

Published: 29 September 2021

Abstract

PDF (4.9 MB) Collect Collected

Downloads：59

Recently, local differential privacy (LDP) has been used as the de facto standard for data sharing and analyzing with high-level privacy guarantees. Existing LDP-based mechanisms mainly focus on learning statistical information about the entire population from sensitive data. For the first time in the literature, we use LDP for distance estimation between distributed data to support more complicated data analysis. Specifically, we propose PrivBV—a locally differentially private bit vector mechanism with a distance-aware property in the anonymized space. We also present an optimization strategy for reducing privacy leakage in the high-dimensional space. The distance-aware property of PrivBV brings new insights into complicated data analysis in distributed environments. As study cases, we show the feasibility of applying PrivBV to privacy-preserving record linkage and non-interactive clustering. Theoretical analysis and experimental results demonstrate the effectiveness of the proposed scheme.

Open Access Issue

Propagation History Ranking in Social Networks: A Causality-Based Approach

Zheng Wang, Chaokun Wang, Xiaojun Ye, Jisheng Pei, Bin Li

Tsinghua Science and Technology 2020, 25(2): 161-179

Published: 02 September 2019

Abstract

PDF (1.8 MB) Collect Collected

Downloads：44

Information diffusion is one of the most important issues in social network analysis. Unlike most existing works, which either rely on network topology or node profiles, this study focuses on the diffusion itself, i.e., the recorded propagation histories. These histories are the evidence of diffusion and can be used to explain to users what happened in their networks. However, these histories can quickly grow in size and complexity, limiting their capacity to be intuitively understood. To reduce this information overload, in this paper we present the problem of propagation history ranking. The goal is to rank participant edges/nodes by their contribution to the diffusion. We first discuss and adapt a causal measure, Difference of Causal Effects (DCE), as the ranking criterion. Then, to avoid the complex calculation of DCE, we propose two integrated ranking strategies by adopting two indicators. One is responsibility, which captures the necessity aspect of causal effects. We further give an approximate algorithm, which could guarantee a feasible solution, for this indicator. The other is capability, which captures the sufficiency aspect of causal effects. Finally, promising experimental results are presented to verify the feasibility of the proposed ranking strategies.

Open Access Issue

Performance Prediction for Performance-Sensitive Queries Based on Algorithmic Complexity

Chihung Chi, Ye Zhou, Xiaojun Ye

Tsinghua Science and Technology 2013, 18(6): 618-628

Published: 06 December 2013

Abstract

PDF (507.2 KB) Collect Collected

Downloads：10

Performance predictions for database queries allow service providers to determine what resources are needed to ensure their performance. Cost-based or rule-based approaches have been proposed to optimize database query execution plans. However, Virtual Machine (VM)-based database services have little or no sharing of resources or interactions between applications hosted on shared infrastructures. Neither providers nor users have the right combination of visibility/access/expertise to perform proper tuning and provisioning. This paper presents a performance prediction model for query execution time estimates based on the query complexity for various data sizes. The user query execution time is a combination of five basic operator complexities: $O (1)$ , $O (\log (n))$ , $O (n)$ , $O (n \log (n))$ , and $O (n^{2})$ . Moreover, tests indicate that not all queries are equally important for performance prediction. As such, this paper illustrates a performance-sensitive query locating process on three benchmarks: RUBiS, RUBBoS, and TPC-W. A key observation is that performance-sensitive queries are only a small proportion ( $20 %$ ) of the application query set. Evaluation of the performance model on the TPC-W benchmark shows that the query complexity in a real life scenario has an average prediction error rate of less than $10 %$ which demonstrates the effectiveness of this predictive model.

Total 4

<1/11>GOpage