Scholar - SciOpen

Triple-level cell (TLC) NAND flash is increasingly adopted to build solid-state drives (SSDs) for modern computer systems. While TLC NAND flash effectively improves storage density, it faces severe reliability issues; in particular, the pages exhibit different raw bit error rates (RBERs). Integrating strong low-density parity-check (LDPC) code helps to improve reliability but suffers from prolonged and proportional read latency due to multiple read retries for worse pages. The straightforward idea is that dispersing page-size data across several pages in different types can achieve a lower average RBER and reduce the read latency. However, directly implementing this simple idea into flash translation layer (FTL) induces the read amplification issue as one logic page residing in more than one physical page brings several read operations. In this paper, we propose the Dynamic Request Interleaving (DIR) technology for improving the performance of TLC NAND flash-based SSDs, in particular, the aged ones with large RBERs. DIR exploits the observation that the latency of an I/O request is determined, without considering the queuing time, by the access of the slowest device page, i.e., the page that has the highest RBER. By grouping consecutive logical pages that have high locality and interleaving their encoded data in different types of device pages that have different RBERs, DIR effectively reduces the number of read retries for LDPC with limited read amplification. To meet the requirement of allocating hybrid page types for interleaved data, we also design a page-interleaving friendly page allocation scheme, which splits all the planes into multi-plane regions for storing the interleaved data and single-plane regions for storing the normal data. The pages in the multi-plane region can be read/written in parallel by the proposed multi-plane command and avoid the read amplification issue. Based on the DIR scheme and the proposed page allocation scheme, we build DIR-enable FTL, which integrates the proposed schemes into the FTL with some modifications. Our experimental results show that adopting DIR in aged SSDs exploits nearly 33% locality from I/O requests and, on average, reduces 43% read latency over conventional aged SSDs.

Regular Paper Issue

Revisiting the Parallel Strategy for DOACROSS Loops

Song Liu, Yuan-Zhen Cui, Nian-Jun Zou, Wen-Hao Zhu, Dong Zhang, Wei-Guo Wu

Journal of Computer Science and Technology 2019, 34(2): 456-475

Published: 22 March 2019

Abstract Collect Collected

DOACROSS loops are significant parts in many important scientific and engineering applications, which are generally exploited pipeline/wave-front parallelism by loop transformations. However, previous work almost statically performs iterations in parallel threads, thus causing a waste of computing resources in thread synchronization. This paper proposes a brand-new parallel strategy for DOACROSS loops that provides a dynamic task assignment with reduced dependences to achieve wave-front parallelism through loop tiling. The proposed strategy uses a master-slave parallel mode and some customized structures to realize dynamic and flexible parallelization, which effectively avoids threads from waiting in communication. An efficient tile size selection (TSS) approach is also proposed to preserve data reuse in cache for tiled codes. The experimental results show that the proposed parallel strategy obtains good and stable speedups over six typical benchmarks with different problem sizes and different numbers of threads on an Intel^® Xeon^® 32-core server. And it outperforms two static strategies, a barrier-based strategy and a post/wait-based strategy, by 32% and 20% in average performance, respectively. This strategy also yields a better performance than a mutex-based dynamic strategy. Besides, it has been demonstrated that the proposed TSS approach can achieve a near-optimal performance and is comparable with a state-of-the-art TSS approach.

Total 2