Fast and Scalable GPU-Based RPA Decoder for Reed-Muller Codes

Kairui Tian^{^a}, He Sun^{^b}, Zhanxian Liu^{^c}, Rongke Liu^{^d}()

^a School of Electronic and Information Engineering, Beihang University, Beijing 100191, China.

^b School of Electronic and Information Engineering, Beihang University, Beijing 100191, China, and also with the Department of Electrical and Computer Engineering, National University of Singapore, 119077 Singapore.

^c School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China.

^dShenzhen Institute of Beihang University, Shenzhen 518063, China, and also with the School of Electronic and Information Engineering, Beihang University, Beijing 100191, China.

Show Author Information

Abstract

The emerging massive communication (MC) highlights the need for efficient short-length error- correction coding schemes with effective decoders. Classical Reed-Muller (RM) codes combined with the recently developed recursive projection-aggregation (RPA) decoder present a promising solution, as the RPA decoder demonstrates near maximum likelihood (ML) performance and supports highly parallel implementation. To address the speed and flexibility requirements of cloud radio access networks (C-RANs) across various MC applications, this paper proposes a fast and scalable RPA decoder on graphics processing units (GPUs). By leveraging a thread-per-projection mapping strategy, we develop an optimized thread block architecture for the RPA decoding of second-order RM codes, which can be easily extended to construct a multi-dimensional block array for decoding higher-order RM codes. Additionally, we introduce a stationary projection pruning technique that seamlessly adapts the RPA decoder kernel to simplified variants, facilitating flexible trade-offs between error-correction performance and implementation complexity. Experimental results show that the pruned RPA decoder kernel on the NVIDIA A100 GPU achieves throughputs of 1.69 Gbps and 1.33 Gbps for the RM(6,2) and RM(7,2) codes, respectively, delivering speedups of ×2.95 and ×3.69 compared to a state-of-the-art software-based successive cancellation list (SCL) decoder.

Keywords

Reed-Muller codes recursive projection-aggregation decoder GPU high-speed parallel processing

Tsinghua Science and Technology

Cite this article:

Tian K, Sun H, Liu Z, et al. Fast and Scalable GPU-Based RPA Decoder for Reed-Muller Codes. Tsinghua Science and Technology, 2025, https://doi.org/10.26599/TST.2025.9010001