Scholar - SciOpen

Open Access Issue

COMBAT: A New Bitmap Index Coding Algorithm for Big Data

Yinjun Wu, Zhen Chen, Yuhao Wen, Wenxun Zheng, Junwei Cao

Tsinghua Science and Technology 2016, 21(2): 136-145

Published: 31 March 2016

Abstract

PDF (5.8 MB) Collect Collected

Downloads：21

Bitmap indexing has been widely used in various applications due to its speed in bitwise operations. However, it can consume large amounts of memory. To solve this problem, various bitmap coding algorithms have been proposed. In this paper, we present COMbining Binary And Ternary encoding (COMBAT), a new bitmap index coding algorithm. Typical algorithms derived from Word Aligned Hybrid (WAH) are COMPressed Adaptive indeX (COMPAX) and Compressed “n” Composable Integer Set (CONCISE), which can combine either two or three continuous words after WAH encoding. COMBAT combines both mechanisms and results in more compact bitmap indexes. Moreover, querying time of COMBAT can be faster than that of COMPAX and CONCISE, since bitmap indexes are smaller and it would take less time to load them into memory. To prove the advantages of COMBAT, we extend a theoretical analysis model proposed by our group, which is composed of the analysis of various possible bitmap indexes. Some experimental results based on real data are also provided, which show COMBAT’s storage and speed superiority. Our results demonstrate the advantages of COMBAT and codeword statistics are provided to solidify the proof.

Open Access Issue

A Survey of Bitmap Index Compression Algorithms for Big Data

Zhen Chen, Yuhao Wen, Junwei Cao, Wenxun Zheng, Jiahui Chang, Yinjun Wu, Ge Ma, Mourad Hakmaoui, Guodong Peng

Tsinghua Science and Technology 2015, 20(1): 100-115

Published: 12 February 2015

Abstract

PDF (2.1 MB) Collect Collected

Downloads：21

With the growing popularity of Internet applications and the widespread use of mobile Internet, Internet traffic has maintained rapid growth over the past two decades. Internet Traffic Archival Systems (ITAS) for packets or flow records have become more and more widely used in network monitoring, network troubleshooting, and user behavior and experience analysis. Among the three key technologies in ITAS, we focus on bitmap index compression algorithm and give a detailed survey in this paper. The current state-of-the-art bitmap index encoding schemes include: BBC, WAH, PLWAH, EWAH, PWAH, CONCISE, COMPAX, VLC, DF-WAH, and VAL-WAH. Based on differences in segmentation, chunking, merge compress, and Near Identical (NI) features, we provide a thorough categorization of the state-of-the-art bitmap index compression algorithms. We also propose some new bitmap index encoding algorithms, such as SECOMPAX, ICX, MASC, and PLWAH+, and present the state diagrams for their encoding algorithms. We then evaluate their CPU and GPU implementations with a real Internet trace from CAIDA. Finally, we summarize and discuss the future direction of bitmap index compression algorithms. Beyond the application in network security and network forensic, bitmap index compression with faster bitwise-logical operations and reduced search space is widely used in analysis in genome data, geographical information system, graph databases, image retrieval, Internet of things, etc. It is expected that bitmap index compression will thrive and be prosperous again in Big Data era since 1980s.

Open Access Issue

Electricity Services Based Dependability Model of Power Grid Communication Networking

Jiye Wang, Kun Meng, Junwei Cao, Zhen Chen, Lingchao Gao, Chuang Lin

Tsinghua Science and Technology 2014, 19(2): 121-132

Published: 15 April 2014

Abstract

PDF (1.7 MB) Collect Collected

Downloads：22

The technology of Ultra-High Voltage (UHV) transmission requires higher dependability for electric power grid. Power Grid Communication Networking (PGCN), the fundamental information infrastructure, severs data transmission including control signal, protection signal, and common data services. Dependability is the necessary requirement to ensure services timely and accurately. Dependability analysis aims to predicate operation status and provide suitable strategies getting rid of the potential dangers. Due to the dependability of PGCN may be affected by external environment, devices quality, implementation strategies, and so on, the scale explosion and the structure complexity make the PGCN’s dependability much challenging. In this paper, with the observation of interdependency between power grid and PGCN, we propose an electricity services based dependability analysis model of PGCN. The model includes methods of analyzing its dependability and procedures of designing the dependable strategies. We respectively discuss the deterministic analysis method based on matrix analysis and stochastic analysis model based on stochastic Petri nets.

Open Access Issue

Collaborative Network Security in Multi-Tenant Data Center for Cloud Computing

Zhen Chen, Wenyu Dong, Hang Li, Peng Zhang, Xinming Chen, Junwei Cao

Tsinghua Science and Technology 2014, 19(1): 82-94

Published: 07 February 2014

Abstract

PDF (2.5 MB) Collect Collected

Downloads：80

A data center is an infrastructure that supports Internet service. Cloud computing is rapidly changing the face of the Internet service infrastructure, enabling even small organizations to quickly build Web and mobile applications for millions of users by taking advantage of the scale and flexibility of shared physical infrastructures provided by cloud computing. In this scenario, multiple tenants save their data and applications in shared data centers, blurring the network boundaries between each tenant in the cloud. In addition, different tenants have different security requirements, while different security policies are necessary for different tenants. Network virtualization is used to meet a diverse set of tenant-specific requirements with the underlying physical network, enabling multi-tenant datacenters to automatically address a large and diverse set of tenants requirements. In this paper, we propose the system implementation of vCNSMS, a collaborative network security prototype system used in a multi-tenant data center. We demonstrate vCNSMS with a centralized collaborative scheme and deep packet inspection with an open source UTM system. A security level based protection policy is proposed for simplifying the security rule management for vCNSMS. Different security levels have different packet inspection schemes and are enforced with different security plugins. A smart packet verdict scheme is also integrated into vCNSMS for intelligence flow processing to protect from possible network attacks inside a data center network.

Open Access Issue

Mobile Internet Big Data Platform in China Unicom

Wenliang Huang, Zhen Chen, Wenyu Dong, Hang Li, Bin Cao, Junwei Cao

Tsinghua Science and Technology 2014, 19(1): 95-101

Published: 07 February 2014

Abstract

PDF (923.2 KB) Collect Collected

Downloads：83

China Unicom, the largest WCDMA 3G operator in China, meets the requirements of the historical Mobile Internet Explosion, or the surging of Mobile Internet Traffic from mobile terminals. According to the internal statistics of China Unicom, mobile user traffic has increased rapidly with a Compound Annual Growth Rate (CAGR) of 135%. Currently China Unicom monthly stores more than 2 trillion records, data volume is over 525 TB, and the highest data volume has reached a peak of 5 PB. Since October 2009, China Unicom has been developing a home-brewed big data storage and analysis platform based on the open source Hadoop Distributed File System (HDFS) as it has a long-term strategy to make full use of this Big Data. All Mobile Internet Traffic is well served using this big data platform. Currently, the writing speed has reached 1 390 000 records per second, and the record retrieval time in the table that contains trillions of records is less than 100 ms. To take advantage of this opportunity to be a Big Data Operator, China Unicom has developed new functions and has multiple innovations to solve space and time constraint challenges presented in data processing. In this paper, we will introduce our big data platform in detail. Based on this big data platform, China Unicom is building an industry ecosystem based on Mobile Internet Big Data, and considers that a telecom operator centric ecosystem can be formed that is critical to reach prosperity in the modern communications business.

Open Access Issue

MobSafe: Cloud Computing Based Forensic Analysis for Massive Mobile Applications Using Data Mining

Jianlin Xu, Yifan Yu, Zhen Chen, Bin Cao, Wenyu Dong, Yu Guo, Junwei Cao

Tsinghua Science and Technology 2013, 18(4): 418-427

Published: 05 August 2013

Abstract

PDF (2.1 MB) Collect Collected

Downloads：15

With the explosive increase in mobile apps, more and more threats migrate from traditional PC client to mobile device. Compared with traditional Win+Intel alliance in PC, Android+ARM alliance dominates in Mobile Internet, the apps replace the PC client software as the major target of malicious usage. In this paper, to improve the security status of current mobile apps, we propose a methodology to evaluate mobile apps based on cloud computing platform and data mining. We also present a prototype system named MobSafe to identify the mobile app’s virulence or benignancy. Compared with traditional method, such as permission pattern based method, MobSafe combines the dynamic and static analysis methods to comprehensively evaluate an Android app. In the implementation, we adopt Android Security Evaluation Framework (ASEF) and Static Android Analysis Framework (SAAF), the two representative dynamic and static analysis methods, to evaluate the Android apps and estimate the total time needed to evaluate all the apps stored in one mobile app market. Based on the real trace from a commercial mobile app market called AppChina, we can collect the statistics of the number of active Android apps, the average number apps installed in one Android device, and the expanding ratio of mobile apps. As mobile app market serves as the main line of defence against mobile malwares, our evaluation results show that it is practical to use cloud computing platform and data mining to verify all stored apps routinely to filter out malware apps from mobile app markets. As the future work, MobSafe can extensively use machine learning to conduct automotive forensic analysis of mobile apps based on the generated multifaceted data in this stage.

Open Access Issue

TIFAflow: Enhancing Traffic Archiving System with Flow Granularity for Forensic Analysis in Network Security

Zhen Chen, Lingyun Ruan, Junwei Cao, Yifan Yu, Xin Jiang

Tsinghua Science and Technology 2013, 18(4): 406-417

Published: 05 August 2013

Abstract

PDF (1.1 MB) Collect Collected

Downloads：9

The archiving of Internet traffic is an essential function for retrospective network event analysis and forensic computer communication. The state-of-the-art approach for network monitoring and analysis involves storage and analysis of network flow statistic. However, this approach loses much valuable information within the Internet traffic. With the advancement of commodity hardware, in particular the volume of storage devices and the speed of interconnect technologies used in network adapter cards and multi-core processors, it is now possible to capture 10 Gbps and beyond real-time network traffic using a commodity computer, such as n2disk. Also with the advancement of distributed file system (such as Hadoop, ZFS, etc.) and open cloud computing platform (such as OpenStack, CloudStack, and Eucalyptus, etc.), it is practical to store such large volume of traffic data and fully in-depth analyse the inside communication within an acceptable latency. In this paper, based on well-known TimeMachine, we present TIFAflow, the design and implementation of a novel system for archiving and querying network flows. Firstly, we enhance the traffic archiving system named TImemachine+FAstbit (TIFA) with flow granularity, i.e., supply the system with flow table and flow module. Secondly, based on real network traces, we conduct performance comparison experiments of TIFAflow with other implementations such as common database solution, TimeMachine and TIFA system. Finally, based on comparison results, we demonstrate that TIFAflow has a higher performance improvement in storing and querying performance than TimeMachine and TIFA, both in time and space metrics.

Open Access Issue

Performance Evaluation and Dynamic Optimization of Speed Scaling on Web Servers in Cloud Computing

Yuan Tian, Chuang Lin, Zhen Chen, Jianxiong Wan, Xuehai Peng

Tsinghua Science and Technology 2013, 18(3): 298-307

Published: 03 June 2013

Abstract

PDF (1.5 MB) Collect Collected

Downloads：20

The energy consumption in large-scale data centers is attracting more and more attention today with the increasing data center energy costs making the enhanced performance very expensive. This is becoming a bottleneck to further developments in terms of both scale and performance of cloud computing. Thus, the reduction of the energy consumption by data centers is becoming a key research topic in green IT and green computing. The web servers providing cloud service computing run at various speeds for different scenarios. By shifting among these states using speed scaling, the energy consumption is proportional to the workload, which is termed energy-proportionality. This study uses stochastic service decision nets to investigate energy-efficient speed scaling on web servers. This model combines stochastic Petri nets with Markov decision process models. This enables the model to dynamically optimize the speed scaling strategy and make performance evaluations. The model is graphical and intuitive enough to characterize complicated system behavior and decisions. The model is service-oriented using the typical service patterns to reduce the complex model to a simple model with a smaller state space. Performance and reward equivalent analyse substantially reduces the system behavior sub-net. The model gives the optimal strategy and evaluates performance and energy metrics more concisely.

Open Access Issue

TST: Threshold Based Similarity Transitivity Method in Collaborative Filtering with Cloud Computing

Feng Xie, Zhen Chen, Hongfeng Xu, Xiwei Feng, Qi Hou

Tsinghua Science and Technology 2013, 18(3): 318-327

Published: 03 June 2013

Abstract

PDF (1.2 MB) Collect Collected

Downloads：8

Collaborative filtering solves information overload problem by presenting personalized content to individual users based on their interests, which has been extensively applied in real-world recommender systems. As a class of simple but efficient collaborative filtering method, similarity based approaches make predictions by finding users with similar taste or items that have been similarly chosen. However, as the number of users or items grows rapidly, the traditional approach is suffering from the data sparsity problem. Inaccurate similarities derived from the sparse user-item associations would generate the inaccurate neighborhood for each user or item. Consequently, its poor recommendation drives us to propose a Threshold based Similarity Transitivity (TST) method in this paper. TST firstly filters out those inaccurate similarities by setting an intersection threshold and then replaces them with the transitivity similarity. Besides, the TST method is designed to be scalable with MapReduce framework based on cloud computing platform. We evaluate our algorithm on the public data set MovieLens and a real-world data set from AppChina (an Android application market) with several well-known metrics including precision, recall, coverage, and popularity. The experimental results demonstrate that TST copes well with the tradeoff between quality and quantity of similarity by setting an appropriate threshold. Moreover, we can experimentally find the optimal threshold which will be smaller as the data set becomes sparser. The experimental results also show that TST significantly outperforms the traditional approach even when the data becomes sparser.

Open Access Issue

Cloud Computing-Based Forensic Analysis for Collaborative Network Security Management System

Zhen Chen, Fuye Han, Junwei Cao, Xin Jiang, Shuo Chen

Tsinghua Science and Technology 2013, 18(1): 40-50

Published: 07 February 2013

Abstract

PDF (9 MB) Collect Collected

Downloads：21

Internet security problems remain a major challenge with many security concerns such as Internet worms, spam, and phishing attacks. Botnets, well-organized distributed network attacks, consist of a large number of bots that generate huge volumes of spam or launch Distributed Denial of Service (DDoS) attacks on victim hosts. New emerging botnet attacks degrade the status of Internet security further. To address these problems, a practical collaborative network security management system is proposed with an effective collaborative Unified Threat Management (UTM) and traffic probers. A distributed security overlay network with a centralized security center leverages a peer-to-peer communication protocol used in the UTMs collaborative module and connects them virtually to exchange network events and security rules. Security functions for the UTM are retrofitted to share security rules. In this paper, we propose a design and implementation of a cloud-based security center for network security forensic analysis. We propose using cloud storage to keep collected traffic data and then processing it with cloud computing platforms to find the malicious attacks. As a practical example, phishing attack forensic analysis is presented and the required computing and storage resources are evaluated based on real trace data. The cloud-based security center can instruct each collaborative UTM and prober to collect events and raw traffic, send them back for deep analysis, and generate new security rules. These new security rules are enforced by collaborative UTM and the feedback events of such rules are returned to the security center. By this type of close-loop control, the collaborative network security management system can identify and address new distributed attacks more quickly and effectively.