Scholar - SciOpen

The volume of information that needs to be processed in big data clusters increases rapidly nowadays. It is critical to execute the data analysis in a time-efficient manner. However, simply adding more computation resources may not speed up the data analysis significantly. The data analysis jobs usually consist of multiple stages which are organized as a directed acyclic graph (DAG). The precedence relationships between stages cause scheduling challenges. General DAG scheduling is a well-known NP-hard problem. Moreover, we observe that in some parallel computing frameworks such as Spark, the execution of a stage in DAG contains multiple phases that use different resources. We notice that carefully arranging the execution of those resources in pipeline can reduce their idle time and improve the average resource utilization. Therefore, we propose a resource pipeline scheme with the objective of minimizing the job makespan. For perfectly parallel stages, we propose a contention-free scheduler with detailed theoretical analysis. Moreover, we extend the contention-free scheduler for three-phase stages, considering the computation phase of some stages can be partitioned. Additionally, we are aware that job stages in real-world applications are usually not perfectly parallel. We need to frequently adjust the parallelism levels during the DAG execution. Considering reinforcement learning (RL) techniques can adjust the scheduling policy on the fly, we investigate a scheduler based on RL for online arrival jobs. The RL-based scheduler can adjust the resource contention adaptively. We evaluate both contention-free and RL-based schedulers on a Spark cluster. In the evaluation, a real-world cluster trace dataset is used to simulate different DAG styles. Evaluation results show that our pipelined scheme can significantly improve CPU and network utilization.

Open Access Issue

Approximating Special Social Influence Maximization Problems

Jie Wu, Ning Wang

Tsinghua Science and Technology 2020, 25(6): 703-711

Published: 07 May 2020

Abstract

PDF (1.8 MB) Collect Collected

Downloads：24

Social Influence Maximization Problems (SIMPs) deal with selecting $k$ seeds in a given Online Social Network (OSN) to maximize the number of eventually-influenced users. This is done by using these seeds based on a given set of influence probabilities among neighbors in the OSN. Although the SIMP has been proved to be NP-hard, it has both submodular (with a natural diminishing-return) and monotone (with an increasing influenced users through propagation) that make the problem suitable for approximation solutions. However, several special SIMPs cannot be modeled as submodular or monotone functions. In this paper, we look at several conditions under which non-submodular or non-monotone functions can be handled or approximated. One is a profit-maximization SIMP where seed selection cost is included in the overall utility function, breaking the monotone property. The other is a crowd-influence SIMP where crowd influence exists in addition to individual influence, breaking the submodular property. We then review several new techniques and notions, including double-greedy algorithms and the supermodular degree, that can be used to address special SIMPs. Our main results show that for a specific SIMP model, special network structures of OSNs can help reduce its time complexity of the SIMP.

Total 2