Abstract
The emerging massive communication (MC) highlights the need for efficient short-length error- correction coding schemes with effective decoders. Classical Reed-Muller (RM) codes combined with the recently developed recursive projection-aggregation (RPA) decoder present a promising solution, as the RPA decoder demonstrates near maximum likelihood (ML) performance and supports highly parallel implementation. To address the speed and flexibility requirements of cloud radio access networks (C-RANs) across various MC applications, this paper proposes a fast and scalable RPA decoder on graphics processing units (GPUs). By leveraging a thread-per-projection mapping strategy, we develop an optimized thread block architecture for the RPA decoding of second-order RM codes, which can be easily extended to construct a multi-dimensional block array for decoding higher-order RM codes. Additionally, we introduce a stationary projection pruning technique that seamlessly adapts the RPA decoder kernel to simplified variants, facilitating flexible trade-offs between error-correction performance and implementation complexity. Experimental results show that the pruned RPA decoder kernel on the NVIDIA A100 GPU achieves throughputs of 1.69 Gbps and 1.33 Gbps for the RM(6,2) and RM(7,2) codes, respectively, delivering speedups of ×2.95 and ×3.69 compared to a state-of-the-art software-based successive cancellation list (SCL) decoder.