Nowadays, application migration becomes more and more attractive. For example, it can make computation closer to data sources or make service closer to end-users, which may significantly decrease latency in edge computing. Yet, migrating applications among servers that are controlled by different platform owners raises security issues. We leverage hardware-secured trusted execution environment (TEE, aka., enclave) technologies, such as Intel SGX, AMD SEV, and ARM TrustZone, for protecting critical computations on untrusted servers. However, these hardware TEEs propose non-uniform programming abstractions and are based on heterogeneous architectures, which not only forces programmers to develop secure applications targeting some specific abstraction but also hinders the migration of protected applications. Therefore, we propose UniTEE which gives a unified enclave programming abstraction across the above three hardware TEEs by using a microkernel-based design and enables the secure enclave migration by integrating heterogeneous migration techniques. We have implemented the prototype on real machines. The evaluation results show the migration support incurs nearly-zero runtime overhead and the migration procedure is also efficient.
- Article type
- Year
- Co-author
Persistent indexing structures are proposed in response to emerging non-volatile memory (NVM) to provide high performance yet durable indexes. However, due to the lack of real NVM hardware, many prior persistent indexing structures were evaluated via emulation, which varies a lot across different setups and differs from the real deployment. Recently, Intel has released its Optane DC Persistent Memory Module (PMM), which is the first production-ready NVM. In this paper, we revisit popular persistent indexing structures on PMM and conduct comprehensive evaluations to study the performance differences among persistent indexing structures, including persistent hash tables and persistent trees. According to the evaluation results, we find that Cacheline-Conscious Extendible Hashing (CCEH) achieves the best performance among all evaluated persistent hash tables, and Failure-Atomic ShifT B+-Tree (FAST) and Write Optimal Radix Tree (WORT) perform better than other trees. Besides, we find that the insertion performance of hash tables is heavily influenced by data locality, while the insertion latency of trees is dominated by the flush instructions. We also uncover that no existing emulation methods accurately simulate PMM for all the studied data structures. Finally, we provide three suggestions on how to fully utilize PMM for better performance, including using clflushopt/clwb with sfence instead of clflush, flushing continuous data in a batch, and avoiding data access immediately after it is flushed to PMM.
Deterministic databases can improve the performance of distributed workload by eliminating the distributed commit protocol and reducing the contention cost. Unfortunately, the current deterministic scheme does not consider the performance scalability within a single machine. In this paper, we describe a scalable deterministic concurrency control, Deterministic and Optimistic Concurrency Control (DOCC), which is able to scale the performance both within a single node and across multiple nodes. The performance improvement comes from enforcing the determinism lazily and avoiding read-only transaction blocking the execution. The evaluation shows that DOCC achieves 8x performance improvement than the popular deterministic database system, Calvin.
The multicore evolution has stimulated renewed interests in scaling up applications on shared-memory multiprocessors, significantly improving the scalability of many applications. But the scalability is limited within a single node; therefore programmers still have to redesign applications to scale out over multiple nodes. This paper revisits the design and implementation of distributed shared memory (DSM) as a way to scale out applications optimized for non-uniform memory access (NUMA) architecture over a well-connected cluster. This paper presents MAGI, an efficient DSM system that provides a transparent shared address space with scalable performance on a cluster with fast network interfaces. MAGI is unique in that it presents a NUMA abstraction to fully harness the multicore resources in each node through hierarchical synchronization and memory management. MAGI also exploits the memory access patterns of big-data applications and leverages a set of optimizations for remote direct memory access (RDMA) to reduce the number of page faults and the cost of the coherence protocol. MAGI has been implemented as a user-space library with pthread-compatible interfaces and can run existing multithreaded applications with minimized modifications. We deployed MAGI over an 8-node RDMAenabled cluster. Experimental evaluation shows that MAGI achieves up to 9.25x speedup compared with an unoptimized implementation, leading to a scalable performance for large-scale data-intensive applications.
Blockchain is becoming popular as a distributed and reliable ledger which allows distrustful parties to transact safely without trusting third parties. Emerging blockchain systems like Ethereum support smart contracts where miners can run arbitrary user-defined programs. However, one of the biggest concerns about the blockchain and the smart contract is privacy, since all the transactions on the chain are exposed to the public. In this paper, we present ShadowEth, a system that leverages hardware enclave to ensure the confidentiality of smart contracts while keeping the integrity and availability based on existing public blockchains like Ethereum. ShadowEth establishes a confidential and secure platform protected by trusted execution environment (TEE) off the public blockchain for the execution and storage of private contracts. It only puts the process of verification on the blockchain. We provide a design of our system including a protocol of the cryptographic communication and verification and show the applicability and feasibility of ShadowEth by various case studies. We implement a prototype using the Intel SGX on the Ethereum network and analyze the security and availability of the system.
Using a password manager is known to be more convenient and secure than not using one, on the assumption that the password manager itself is safe. However recent studies show that most popular password managers have security vulnerabilities that may be fooled to leak passwords without users’ awareness. In this paper, we propose a new password manager, SplitPass, which vertically separates both the storage and access of passwords into two mutually distrusting parties. During login, all the parties will collaborate to send their password shares to the web server, but none of these parties will ever have the complete password, which significantly raises the bar of a successful attack to compromise all of the parties. To retain transparency to existing applications and web servers, SplitPass seamlessly splits the secure sockets layer (SSL) and transport layer security (TCP) sessions to process on all parties, and makes the joining of two password shares transparent to the web servers. We have implemented SplitPass using an Android phone and a cloud assistant and evaluated it using 100 apps from top free apps in the Android official market. The evaluation shows that SplitPass securely protects users’ passwords, while incurring little performance overhead and power consumption.
As multi-core processors become the de-facto configuration in modern computers, the adoption of SMP Virtual Machines (VMs) has been increasing, allowing for more efficient use of computing resources. However, because of existence of schedulers in both the hypervisor and the guest VMs, this creates a new research problem, viz., double scheduling. Although double scheduling may cause many issues including lock-holder preemption, vCPU stacking, CPU fragmentation, and priority inversion, prior approaches have either introduced new problems and/or addressed the problem incompletely. In this paper, we describe the design and implementation of FlexCore, a new scheduling scheme using vCPU ballooning, which dynamically adjusts the number of vCPUs of a VM at runtime. This essentially eliminates unnecessary scheduling in the hypervisor layer, and thus, boosts performance significantly. An evaluation using a complete KVM-based implementation shows that the average performance improvement for PARSEC applications on a 12-core Intel machine is approximately 52.9%, ranging from 35.4% to 79.6%.