PDF (4.4 MB)
Collect
Submit Manuscript
Show Outline
Outline
Abstract
Keywords
Show full outline
Hide outline
Open Access | Just Accepted

Fast and Scalable GPU-Based RPA Decoder for Reed-Muller Codes

Kairui TianaHe SunbZhanxian LiucRongke Liud()

a School of Electronic and Information Engineering, Beihang University, Beijing 100191, China. 

b School of Electronic and Information Engineering, Beihang University, Beijing 100191, China, and also with the Department of Electrical and Computer Engineering, National University of Singapore, 119077 Singapore.

c School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China. 

d Shenzhen Institute of Beihang University, Shenzhen 518063, China, and also with the School of Electronic and Information Engineering, Beihang University, Beijing 100191, China. 

Show Author Information

Abstract

The emerging massive communication (MC) highlights the need for efficient short-length error- correction coding schemes with effective decoders. Classical Reed-Muller (RM) codes combined with the recently developed recursive projection-aggregation (RPA) decoder present a promising solution, as the RPA decoder demonstrates near maximum likelihood (ML) performance and supports highly parallel implementation. To address the speed and flexibility requirements of cloud radio access networks (C-RANs) across various MC applications, this paper proposes a fast and scalable RPA decoder on graphics processing units (GPUs). By leveraging a thread-per-projection mapping strategy, we develop an optimized thread block architecture for the RPA decoding of second-order RM codes, which can be easily extended to construct a multi-dimensional block array for decoding higher-order RM codes. Additionally, we introduce a stationary projection pruning technique that seamlessly adapts the RPA decoder kernel to simplified variants, facilitating flexible trade-offs between error-correction performance and implementation complexity. Experimental results show that the pruned RPA decoder kernel on the NVIDIA A100 GPU achieves throughputs of 1.69 Gbps and 1.33 Gbps for the RM(6,2) and RM(7,2) codes, respectively, delivering speedups of ×2.95 and ×3.69 compared to a state-of-the-art software-based successive cancellation list (SCL) decoder.

Tsinghua Science and Technology
Cite this article:
Tian K, Sun H, Liu Z, et al. Fast and Scalable GPU-Based RPA Decoder for Reed-Muller Codes. Tsinghua Science and Technology, 2025, https://doi.org/10.26599/TST.2025.9010001
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return