Scholar - SciOpen

To solve the problem of grid coarse-grained reconfigurable array task mapping under multiple constraints, we propose a Loop Subgraph-Level Greedy Mapping (LSLGM) algorithm using parallelism and processing element fragmentation. Under the constraint of a reconfigurable array, the LSLGM algorithm schedules node from a ready queue to the current reconfigurable cell array block. After mapping a node, its successor’s indegree value will be dynamically updated. If its successor’s indegree is zero, it will be directly scheduled to the ready queue; otherwise, the predecessor must be dynamically checked. If the predecessor cannot be mapped, it will be scheduled to a blocking queue. To dynamically adjust the ready node scheduling order, the scheduling function is constructed by exploiting factors, such as node number, node level, and node dependency. Compared with the loop subgraph-level mapping algorithm, experimental results show that the total cycles of the LSLGM algorithm decreases by an average of 33.0 $%$ ( ${PEA}_{4 \times 4}$ ) and 33.9 $%$ ( ${PEA}_{7 \times 7}$ ). Compared with the epimorphism map algorithm, the total cycles of the LSLGM algorithm decrease by an average of 38.1 $%$ ( ${PEA}_{4 \times 4}$ ) and 39.0 $%$ ( ${PEA}_{7 \times 7}$ ). The feasibility of LSLGM is verified.

Open Access Issue

Efficient Scheduling Mapping Algorithm for Row Parallel Coarse-Grained Reconfigurable Architecture

Naijin Chen, Zhen Wang, Ruixiang He, Jianhui Jiang, Fei Cheng, Chenghao Han

Tsinghua Science and Technology 2021, 26(5): 724-735

Published: 20 April 2021

Abstract

PDF (923.9 KB) Collect Collected

Downloads：57

Row Parallel Coarse-Grained Reconfigurable Architecture (RPCGRA) has the advantages of maximum parallelism and programmable flexibility. Designing an efficient algorithm to map the diverse applications onto RPCGRA is difficult due to a number of RPCGRA hardware constraints. To solve this problem, the nodes of the data flow graph must be partitioned and scheduled onto the RPCGRA. In this paper, we present a Depth-First Greedy Mapping (DFGM) algorithm that simultaneously considers the communication costs and the use times of the Reconfigurable Cell Array (RCA). Compared with level breadth mapping, the performance of DFGM is better. The percentage of maximum improvement in the use times of RCA is 33% and the percentage of maximum improvement in non-original input and output times is 64.4% (Given Discrete Cosine Transfor 8 (DCT8), and the area of reconfigurable processing unit is 56). Compared with level-based depth mapping, DFGM also obtains the lowest averages of use times of RCA, non-original input and output times, and the reconfigurable time.

Total 2