Scholar - SciOpen

Internet users heavily rely on web search engines for their intended information. The major revenue of search engines is advertisements (or ads). However, the search advertising suffers from fraud. Fraudsters generate fake traffic which does not reach the intended audience, and increases the cost of the advertisers. Therefore, it is critical to detect fraud in web search. Previous studies solve this problem through fraudster detection (especially bots) by leveraging fraudsters’ unique behaviors. However, they may fail to detect new means of fraud, such as crowdsourcing fraud, since crowd workers behave in part like normal users. To this end, this paper proposes an approach to detecting fraud in web search from the perspective of fraudulent keywords. We begin by using a unique dataset of 150 million web search logs to examine the discriminating features of fraudulent keywords. Specifically, we model the temporal correlation of fraudulent keywords as a graph, which reveals a very well-connected community structure. Next, we design DFW (detection of fraudulent keywords) that mines the temporal correlations between candidate fraudulent keywords and a given list of seeds. In particular, DFW leverages several refinements to filter out non-fraudulent keywords that co-occur with seeds occasionally. The evaluation using the search logs shows that DFW achieves high fraud detection precision (99%) and accuracy (93%). A further analysis reveals several typical temporal evolution patterns of fraudulent keywords and the co-existence of both bots and crowd workers as fraudsters for web search fraud.

Regular Paper Issue

Optimizing Multi-Dimensional Packet Classification for Multi-Core Systems

Tong Shen, Da-Fang Zhang, Gao-Gang Xie, Xin-Yi Zhang

Journal of Computer Science and Technology 2018, 33 (5): 1056-1071

Published: 12 September 2018

Abstract Collect Collected

Packet classification has been studied for decades; it classifies packets into specific flows based on a given rule set. As software-defined network was proposed, a recent trend of packet classification is to scale the five-tuple model to multi-tuple. In general, packet classification on multiple fields is a complex problem. Although most existing softwarebased algorithms have been proved extraordinary in practice, they are only suitable for the classic five-tuple model and difficult to be scaled up. Meanwhile, hardware-specific solutions are inflexible and expensive, and some of them are power consuming. In this paper, we propose a universal multi-dimensional packet classification approach for multi-core systems. In our approach, novel data structures and four decomposition-based algorithms are designed to optimize the classification and updating of rules. For multi-field rules, a rule set is cut into several parts according to the number of fields. Each part works independently. In this way, the fields are searched in parallel and all the partial results are merged together at last. To demonstrate the feasibility of our approach, we implement a prototype and evaluate its throughput and latency. Experimental results show that our approach achieves a 40% higher throughput than that of other decomposed-based algorithms and a 43% lower latency of rule incremental update than that of the other algorithms on average. Furthermore, our approach saves 39% memory consumption on average and has a good scalability.

Total 2