Federated learning has been widely employed in many applications to protect the data privacy of participating clients. Although the dataset is decentralized among training devices in federated learning, the model parameters are usually stored in a centralized manner. Centralized federated learning is easy to implement; however, a centralized scheme causes a communication bottleneck at the central server, which may significantly slow down the training process. To improve training efficiency, we investigate the decentralized federated learning scheme. The decentralized scheme has become feasible with the rapid development of device-to-device communication techniques under 5G. Nevertheless, the convergence rate of learning models in the decentralized scheme depends on the network topology design. We propose optimizing the topology design to improve training efficiency for decentralized federated learning, which is a non-trivial problem, especially when considering data heterogeneity. In this paper, we first demonstrate the advantage of hypercube topology and present a hypercube graph construction method to reduce data heterogeneity by carefully selecting neighbors of each training device—a process that resembles classic graph embedding. In addition, we propose a heuristic method for generating torus graphs. Moreover, we have explored the communication patterns in hypercube topology and propose a sequential synchronization scheme to reduce communication cost during training. A batch synchronization scheme is presented to fine-tune the communication pattern for hypercube topology. Experiments on real-world datasets show that our proposed graph construction methods can accelerate the training process, and our sequential synchronization scheme can significantly reduce the overall communication traffic during training.
- Article type
- Year
- Co-author
The volume of information that needs to be processed in big data clusters increases rapidly nowadays. It is critical to execute the data analysis in a time-efficient manner. However, simply adding more computation resources may not speed up the data analysis significantly. The data analysis jobs usually consist of multiple stages which are organized as a directed acyclic graph (DAG). The precedence relationships between stages cause scheduling challenges. General DAG scheduling is a well-known NP-hard problem. Moreover, we observe that in some parallel computing frameworks such as Spark, the execution of a stage in DAG contains multiple phases that use different resources. We notice that carefully arranging the execution of those resources in pipeline can reduce their idle time and improve the average resource utilization. Therefore, we propose a resource pipeline scheme with the objective of minimizing the job makespan. For perfectly parallel stages, we propose a contention-free scheduler with detailed theoretical analysis. Moreover, we extend the contention-free scheduler for three-phase stages, considering the computation phase of some stages can be partitioned. Additionally, we are aware that job stages in real-world applications are usually not perfectly parallel. We need to frequently adjust the parallelism levels during the DAG execution. Considering reinforcement learning (RL) techniques can adjust the scheduling policy on the fly, we investigate a scheduler based on RL for online arrival jobs. The RL-based scheduler can adjust the resource contention adaptively. We evaluate both contention-free and RL-based schedulers on a Spark cluster. In the evaluation, a real-world cluster trace dataset is used to simulate different DAG styles. Evaluation results show that our pipelined scheme can significantly improve CPU and network utilization.
With the quick development of the sharing economy, ride-hailing services have been increasingly popular worldwide. Although the service provides convenience for users, one concern from the public is whether the location privacy of passengers would be protected. Service providers (SPs) such as Didi and Uber need to acquire passenger and driver locations before they could successfully dispatch passenger orders. To protect passengers’ privacy based on their requirements, we propose a cloaking region based order dispatch scheme. In our scheme, a passenger sends the SP a cloaking region in which his/her actual location is not distinguishable. The trade-off of the enhanced privacy is the loss of social welfare, i.e., the increase in the overall pick-up distance. To optimize our scheme, we propose to maximize the social welfare under passengers’ privacy requirements. We investigate a bipartite matching based approach. A theoretical bound on the matching performance under specific privacy requirements is shown. Besides passengers’ privacy, we allow drivers to set up their maximum pick-up distance in our extended scheme. The extended scheme could be applied when the number of drivers exceeds the number of passengers. Nevertheless, the global matching based scheme does not consider the interest of each individual passenger. The passengers with low privacy requirements may be matched with drivers far from them. To this end, a pricing scheme including three strategies is proposed to make up for the individual loss by allocating discounts on their riding fares. Extensive experiments on both real-world and synthetic datasets show the efficiency of our scheme.