Efficient Scheduling Mapping Algorithm for Row Parallel Coarse-Grained Reconfigurable Architecture

Naijin Chen; Zhen Wang; Ruixiang He; Jianhui Jiang; Fei Cheng; Chenghao Han

doi:10.26599/TST.2020.9010035

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (923.9 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Efficient Scheduling Mapping Algorithm for Row Parallel Coarse-Grained Reconfigurable Architecture

Naijin Chen, Zhen Wang(

), Ruixiang He, Jianhui Jiang, Fei Cheng, Chenghao Han

School of Computer and Information Science, Anhui Polytechnic University, Wuhu 241000, China

School of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 200090, China

School of Software Engineering, Tongji University, Shanghai 201804, China

Show Author Information

Abstract

Row Parallel Coarse-Grained Reconfigurable Architecture (RPCGRA) has the advantages of maximum parallelism and programmable flexibility. Designing an efficient algorithm to map the diverse applications onto RPCGRA is difficult due to a number of RPCGRA hardware constraints. To solve this problem, the nodes of the data flow graph must be partitioned and scheduled onto the RPCGRA. In this paper, we present a Depth-First Greedy Mapping (DFGM) algorithm that simultaneously considers the communication costs and the use times of the Reconfigurable Cell Array (RCA). Compared with level breadth mapping, the performance of DFGM is better. The percentage of maximum improvement in the use times of RCA is 33% and the percentage of maximum improvement in non-original input and output times is 64.4% (Given Discrete Cosine Transfor 8 (DCT8), and the area of reconfigurable processing unit is 56). Compared with level-based depth mapping, DFGM also obtains the lowest averages of use times of RCA, non-original input and output times, and the reconfigurable time.

Keywords

temporal mapping Reconfigurable Cell Array (RCA)listed scheduling communication costs

References

[1]

J. M. P.

Cardoso

, P. C.

Diniz

, and M.

Weinhardt

, Compiling for reconfigurable computing: A survey, ACM Computing Surveys, vol. 42, no. 4, pp. 1301–1365, 2010.

Google Scholar

[2]

J. W.

Yoon

, A.

Shrivastava

, S.

Park

, M.

Ahn

, and Y.

Paek

, A graph drawing based spatial algorithm for coarse-grained reconfigurable architectures, IEEE Transactions on Very Large Scale Integration Systems, vol. 17, no. 11, pp. 1565–1578, 2009.

Google Scholar

[3]

Berekovic

, A.

Kanstein

, B.

Mei

, and B. D.

Sutter

, Mapping of nomadic multimedia applications on the ADRES reconfigurable array processor, Microprocessors and Microsystems, vol. 33, no. 4, pp. 290–294, 2009.

Google Scholar

[4]

R. S.

Ferreira

, J. M. P.

Cardoso

, A.

Damiany

, J.

Vendramini

, and T.

Teixeira

, Fast placement and routing by extending coarse grained reconfigurable arrays with Omega networks, Journal of Systems Architecture, vol. 57, no. 8, pp. 761–777, 2011.

Google Scholar

[5]

Krishnamoorthy

, K.

Varadarajan

, and S. K.

Nandy

, Interconnect-topology independent mapping algorithm for a coarse grained reconfigurable architecture, in Proc. of 2011 International Conference on Field Programmable Technology, New Delhi, India, 2011, pp. 1–5.

[6]

Ahn

, J. W.

Yoon

, Y.

Paek

, Y.

Kim

, M.

Kiemb

, and K.

Choi

, A spatial mapping algorithm for heterogeneous coarse grained reconfigurable architectures, in Proc. of the Conference on Design, Automation and Test in Europe, Munich, Germany, 2006, pp. 363–368.

[7]

Ansaloni

, K.

Tanimura

, L.

Pozzi

, and N.

Dutt

, Integrated kernel partitioning and scheduling for coarse grained reconfigurable arrays, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 12, pp. 1803–1816, 2012.

Google Scholar

[8]

Lee

, K.

Choi

, and N. D.

Dutt

, Mapping multi-domain applications onto coarse grained reconfigurable architectures, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 5, pp. 637–650, 2011.

Google Scholar

[9]

, D.

Lee

, K.

Han

, and K.

Choi

, Design of a coarse-grained reconfigurable architecture with floating-point support and comparative study, Integration, the VLSI Journal, vol. 47, no. 2, pp. 232–241, 2014.

Google Scholar

[10]

Kim

, Y.

Choi

, and H.

park

, Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures, ACM Transactions on Architecture and Code Optimization, vol. 10, no. 4, pp. 1–24, 2013.

Google Scholar

[11]

N. J.

Chen

and J. H.

Jiang

, Mapping algorithm for coarse-grained reconfigurable multimedia architectures, in Proc. of 19th IEEE International Parallel & Distributed Processing Symposium (IPDPS) Workshop, Shanghai, China, 2012, pp. 281–286.

[12]

S. D.

, Research on the software/hardware co-design for reconfigurable processor, PhD dissertation, School of Information Science and Technology, Tsinghua University, Beijing, China, 2009.

[13]

Singh

, M.-H.

Lee

, G.

, F. J.

Kurdahi

, and E. M. C.

Filho

, MorphoSys: An integrated reconfigurable system for data parallel and computation intensive applications, IEEE Transactions on Computers, vol. 49, no. 5, pp. 465–481, 2000.

Google Scholar

[14]

N. J.

Chen

, J. H.

Jiang

, X.

Chen

, Z.

Zhou

, and Y.

, An improved level partitioning algorithm considering minimum execution delay and resource restraints, Acta Electronica Sinica, vol. 40, no. 5, pp. 1055–1066, 2012.

Google Scholar

[15]

Xiao

, Z. H.

Shi

, W. D.

Zhu

, J. H.

Jiang

, Q. W.

Zhou

, J.

Lou

, Y.

Huang

, Q.

, and Z.

Sun

, Uniform non-Bernoulli sequences oriented locating method for reliability-critical gates, Tsinghua Science and Technology, vol. 26, no. 1, pp. 24–35, 2021.

Google Scholar

[16]

Y. M.

Ouyang

, Q.

Wang

, Z.

, H. G.

Liang

, and J.

, Fault-tolerant design for data efficient retransmission in WiNoC, Tsinghua Science and Technology, vol. 26, no.1, pp. 85–94, 2021.

Google Scholar

[17]

Sangyun

, L.

Hongsik

, and L.

Jongeun

, Efficient execution of stream graphs on coarse-grained reconfigurable architectures, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 36, no. 12, pp. 1978–1988, 2017.

Google Scholar

[18]

Bae

, B.

Harris

, H.

Min

, and B.

Egger

, Auto-tuning CNNs for coarse-grained reconfigurable array-based accelerators, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2301–2310, 2018.

Google Scholar

[19]

J. Y.

, S. Y.

Yin

, L. B.

liu

, and S. J.

Wei

, Stress-aware loops mapping on CGRAs with dynamic multi-map reconfiguration, IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 9, pp. 2105–2120, 2018.

Google Scholar

Tsinghua Science and Technology

Volume 26 Issue 5,
October 2021

Pages 724-735

DOI: 10.26599/TST.2020.9010035

Cite this article:

Chen N, Wang Z, He R, et al. Efficient Scheduling Mapping Algorithm for Row Parallel Coarse-Grained Reconfigurable Architecture. Tsinghua Science and Technology, 2021, 26(5): 724-735. https://doi.org/10.26599/TST.2020.9010035

815

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 30 July 2020

Accepted: 08 September 2020

Published: 20 April 2021

© The author(s) 2021. The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).