| Sign up

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

References

Show full outline

Hide outline

Publishing Language: Chinese

Compiler technologies for emerging application paradigms and advanced computer architectures

Guangli LI^{¹^,²}, Zhen DU^{¹^,²}, Jiacheng ZHAO^{¹^,²}, Ying LIU^{¹^,²}, Feng YU^{¹^,²}, Yijin LI^{¹^,²}, Zhongcheng ZHANG^{¹^,²}, Huimin CUI^{¹^,²}()

1.State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

2.University of Chinese Academy of Sciences, Beijing 100049, China

Show Author Information

Abstract

With the increasing demand for computility driven by emerging applications such as artificial intelligence, the compilation technology, serving as a crucial bridge between software and hardware, is facing unprecedented challenges and opportunities. This article focuses on the development trends of domain-specific compilers, and gives an in-depth discussion on the compilation techniques tailored for emerging domains. By examining various aspects including whole-program operator fusion, dynamic-shape tensor compilation, co-design of software and hardware, computational security, this article provides a comprehensive summary and evaluation of representative domain-specific compilation technologies for new application paradigms and architectures. The key role of domain-specific compilation technologies in adapting to diverse computing platforms, improving program execution efficiency, ensuring software security and supporting hardware design are analyzed. Its prospects for applications and future work are also discussed.

Keywords

emerging application paradigm advanced computer architecture system software programming model compiler technology

CLC number: V11;TP314 Document code: A Article ID: 1000-6893（2024）20-630552-18

References

[1]

LI

M Y

, LI

D M

, ZHANG

J W

, et al. Research on predictive maintenance technology of space equipment under the background of big data[J]. China New Telecommunications, 2023, 25（2）: 25-28（in Chinese）.

[2]

WU

XH

, ZHU

JF

, GE

W

. Robust optimization design of route network intervals[C]∥2011 National Doctoral Academic Forum, 2011（in Chinese）.

[3]

ZHOU

Z J

. Application of artificial intelligence technology in air traffic management[J]. China New Telecommunications, 2016（5）: 51（in Chinese）.

[4]

ZHANG

X M

, YU

Z W

, YANG

Y Q

. Application and consideration to boeing 787 influenced by artificial intelligence[J]. Industrial Engineering and Management, 2017, 22（6）: 169-174（in Chinese）.

[5]

LI

Y B

, XU

X W

, ZHAO

J J

, et al. Construction of war game training platform based on OODA model[C]∥Proceedings of the 10th Chinese Command and Control Society Conference, 2022（in Chinese）.

[6]

LI

P Y

. Deleting dead code in link time and translating machine code based on pattern matching[D]. Hefei: University of Science and Technology of China, 2015（in Chinese）.

[7]

ACHARYA

A

, BONDHUGULA

U

, COHEN

A

. Effective loop fusion in polyhedral compilation using fusion conflict graphs[J]. ACM Transactions on Architecture and Code Optimization, 2020, 17（4）: 1-26.

Crossref Google Scholar

[8]

XIA

J

, DAI

H D

, YANG

X J

. A linear expressing based approach for optimizing locality using non-singular loop transformations[J]. Chinese Journal of Computers, 2003, 26（12）: 1609-1620（in Chinese）.

[9]

PENG

C

, LIU

Q Z

, CHEN

C B

. Loop permutation and auto-tuning under polyhedral model[J]. Computer Engineering & Science, 2023, 45（12）: 2121-2134（in Chinese）.

[10]

LOZANO

R C

, CARLSSON

M

, BLINDELL

G H

, et al. Combinatorial register allocation and instruction scheduling[J]. ACM Transactions on Programming Languages and Systems, 2019, 41（3）: 1-53.

Crossref Google Scholar

[11]

ZHANG

J C

, LIAN

R Q

, ZHANG

Z Q

. Register allocation on network processors with multiple register banks[J]. Chinese Journal of Computers, 2006, 29（1）: 66-72（in Chinese）.

[12]

GAO

M

, ZHAO

J C

, CUI

H M

, et al. Bitwidth-aware register binding algorithm[J]. Journal of Software, 2024, 35（6）: 2631-2647（in Chinese）.

[13]

GAO

W

, ZHAO

R C

, HAN

L

, et al. Research on SIMD auto-vectorization compiling optimization[J]. Journal of Software, 2015, 26（6）: 1265-1284（in Chinese）.

[14]

LI

C J

, HUANG

J J

, XU

Y

, et al. Evaluation and analysis of effects of auto-vectorization in typical compilers[J]. Computer Science, 2013, 40（4）: 41-46（in Chinese）.

[15]

FENG

J G

, HE

Y P

, TAO

Q M

. Auto-vectorization: Recent development and prospect[J]. Journal on Communications, 2022, 43（3）: 180-195（in Chinese）.

[16]

MIDKIFF

S P

. Automatic parallelization: An overview of fundamental compiler techniques[M]. Cham: Springer International Publishing, 2012.

[17]

MA

C Y

, LV

B X

, YE

X J

, et al. Automatic parallelization framework for complex nested loops based on LLVM pass[J]. Journal of Software, 2023, 34（7）: 3022-3042（in Chinese）.

[18]

LI

M Z

, LIU

Y

, LIU

X Y

, et al. The deep learning compiler: A comprehensive survey[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32（3）: 708-727.

Crossref Google Scholar

[19]

CHEN

T Q

, MOREAU

T

, JIANG

Z H

, et al. TVM: An automated end-to-end optimizing compiler for deep learning[DB/OL]. arXiv preprint: 1802.04799, 2018.

[20]

LATTNER

C

, AMINI

M

, BONDHUGULA

U

, et al. MLIR: Scaling compiler infrastructure for domain specific computation[C]∥2021 IEEE/ACM International Symposium on Code Generation and Optimization（CGO）. Piscataway: IEEE Press, 2021: 2-14.

[21]

CHEN

C

, QI

F

. Review on development of convolutional neural network and its application in computer vision[J]. Computer Science, 2019, 46（3）: 63-73（in Chinese）.

[22]

VOULODIMOS

A

, DOULAMIS

N

, DOULAMIS

A

, et al. Deep learning for computer vision: A brief review[J]. Computational Intelligence and Neuroscience, 2018, 2018: 7068349.

Crossref Google Scholar

[23]

ZHAO

J S

, SONG

M X

, GAO

X

. Review on the development and application of natural language processing[J]. Information Technology and Informatization, 2019（7）: 142-145（in Chinese）.

[24]

DEGUANG

C

, JINLIN

M A

, ZIPING

M A

, et al. Review of pre-training techniques for natural language processing[J]. Journal of Frontiers of Computer Science & Technology, 2021, 15（8）: 1359.

[25]

WU

Y X

, LIANG

K

, LIU

Y

, et al. The progress and trends of FPGA-based accelerators in deep learning[J]. Chinese Journal of Computers, 2019, 42（11）: 2461-2480（in Chinese）.

[26]

CHEN

Y J

, CHEN

T S

, XU

Z W

, et al. DianNao family[J]. Communications of the ACM, 2016, 59（11）: 105-112.

Crossref Google Scholar

[27]

LU

W Z

, ZHANG

F

, HE

Y X

, et al. Evaluation and optimization for Huawei ascend neural network accelerator[J]. Chinese Journal of Computers, 2022, 45（8）: 1618-1637（in Chinese）.

[28]

PASZKE

A

, GROSS

S

, MASSA

F

, et al. PyTorch: An imperative style, high-performance deep learning library[DB/OL]. arXiv preprint: 1912.01703, 2019.

[29]

ABADI

M

, BARHAM

P

, CHEN

J M

, et al. TensorFlow: A system for large-scale machine learning[DB/OL]. arXiv preprint: 1605.08695, 2016.

[30]

STAUNSTRUP

J

, WOLF

W

. Hardware/software co-design: Principles and practice[M]. Berlin: Springer Science & Business Media, 2013.

[31]

NIU

W

, GUAN

J X

, WANG

Y Z

, et al. DNNFusion: Accelerating deep neural networks execution with advanced operator fusion[C]∥Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. New York: ACM, 2021: 883-898.

[32]

ZHENG

Z

, YANG

X D

, ZHAO

P Z

, et al. AStitch: Enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures[C]∥Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2022: 359-373.

[33]

LI

A

, ZHENG

B J

, PEKHIMENKO

G

, et al. Automatic horizontal fusion for GPU kernels[C]∥2022 IEEE/ACM International Symposium on Code Generation and Optimization（CGO）. Piscataway: IEEE Press, 2022: 14-27.

[34]

MA

L

, XIE

Z

, YANG

Z

, et al. Rammer: Enabling holistic deep learning compiler optimizations with rTasks[C]∥14th USENIX Symposium on Operating Systems Design and Implementation（OSDI 20）, 2020: 881-897.

[35]

ZOPH

B

, VASUDEVAN

V

, SHLENS

J

, et al. Learning transferable architectures for scalable image recognition[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8697-8710.

[36]

XIE

S N

, GIRSHICK

R

, DOLLÁR

P

, et al. Aggregated residual transformations for deep neural networks[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition（CVPR）. Piscataway: IEEE Press, 2017: 5987-5995.

[37]

ZHAO

J

, GAO

X

, XIA

R

, et al. Apollo: Automatic partition-based operator fusion through layer by layer optimization[J]. Proceedings of Machine Learning and Systems, 2022, 4: 1-19.

[38]

LI

Y

, ZHAO

J

, QIANQI

S

, et al. SIRIUS: Harvesting whole-program optimization opportunities for DNNs[J]. Proceedings of Machine Learning and Systems, 2023, 5: 1-17.

[39]

DEVLIN

J

, CHANG

M W

, LEE

K

, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[DB/OL]. arXiv preprint: 1810.04805, 2018.

[40]

WU

S

, XU

Y

, ZHAO

D N

. Survey of object detection based on deep convolutional network[J]. Pattern Recognition and Artificial Intelligence, 2018, 31（4）: 335-346（in Chinese）.

[41]

ZHENG

Z

, PAN

Z F

, WANG

D L

, et al. BladeDISC: Optimizing dynamic shape machine learning workloads via compiler approach[J]. Proceedings of the ACM on Management of Data, 2023, 1（3）: 1-29.

Crossref Google Scholar

[42]

ZHENG

B

, JIANG

Z

, YU

C H

, et al. DietCode: Automatic optimization for dynamic tensor programs[J]. Proceedings of Machine Learning and Systems, 2022, 4: 848-863.

[43]

YU

F

, LI

G L

, ZHAO

J C

, et al. Optimizing dynamic-shape neural networks on accelerators via on-the-fly micro-kernel polymerization[C]∥Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. New York: ACM, 2024: 1-16.

[44]

BEDOUKIAN

P

, ADIT

N

, PEGUERO

E

, et al. Software-defined vector processing on manycore fabrics[C]∥MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. New York: ACM, 2021: 392-406.

[45]

KRASHINSKY

R

, BATTEN

C

, HAMPTON

M

, et al. The vector-thread architecture[C]∥Proceedings of 31st Annual International Symposium on Computer Architecture. Piscataway: IEEE Press, 2004: 52-63.

[46]

LEE

Y

, AVIZIENIS

R

, BISHARA

A

, et al. Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators[C]∥Proceedings of the 38th annual international symposium on Computer architecture. New York: ACM, 2011: 129-140.

[47]

PARK

Y

, PARK

J J K

, PARK

H

, et al. Libra: Tailoring SIMD execution using heterogeneous hardware and dynamic configurability[C]∥2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. Piscataway: IEEE Press, 2012: 84-95.

[48]

ZHANG

Z C

, OU

Y

, LIU

Y

, et al. Occamy: Elastically sharing a SIMD co-processor across multiple CPU cores[C]∥Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. New York: ACM, 2023: 483-497.

[49]

DENG

Y J

, WANG

C X

, YU

S C

, et al. StrongBox: A GPU TEE on arm endpoints[C]∥Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2022: 769-783.

[50]

VOLOS

S

, VASWANI

K

, BRUNO

R

. Graviton: Trusted execution environments on GPUs[C]∥13th USENIX Symposium on Operating Systems Design and Imple-mentation（OSDI 18）, 2018: 681-696.

[51]

JANG

I

, TANG

A

, KIM

T

, et al. Heterogeneous isolated execution for commodity GPUs[C]∥Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2019: 455-468.

[52]

JIANG

J Y

, QI

J

, SHEN

T X

, et al. CRONUS: Fault-isolated, secure and high-performance heterogeneous computing for trusted execution environment[C]∥2022 55th IEEE/ACM International Symposium on Microarchitecture（MICRO）. Piscataway: IEEE Press, 2022: 124-143.

[53]

MAI

H

, ZHAO

J

, ZHENG

H

, et al. Honeycomb: Se-cure and efficient GPU executions via static Valida-tion[C]∥17th USENIX Symposium on Operating Sys-tems Design and Implementation（OSDI 23）. 2023: 155-172.

[54]

LAI

Q K

, LÜ

F

, HE

C L

, et al. An ideal performance oriented approach for cross-framework compiler analysis[J]. Journal of Computer Research and Development, 2021, 58（3）: 668-680（in Chinese）.

[55]

XING

J R

, WANG

L Y

, ZHANG

S

, et al. Bolt: Bridging the gap between auto-tuners and hardware-native performance[DB/OL]. arXiv preprint: 2110.15238, 2021.

[56]

FENG

S Y

, HOU

B H

, JIN

H Y

, et al. TensorIR: An abstraction for automatic tensorized program optimization[C]∥Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. New York: ACM, 2023: 804-817.

[57]

YU

F

, ZHAO

J C

, CUI

H M

, et al. VTensor: Using virtual tensors to build a layout-oblivious AI programming framework[J]. Journal of Computer Science and Technology, 2023, 38（5）: 1074-1097.

Crossref Google Scholar

[58]

HUANG

G Y

, BAI

Y

, LIU

L

, et al. ALCOP: Automatic load-compute pipelining in deep learning compiler for AI-GPUs[DB/OL]. arXiv preprint: 2210.16691, 2022.

[59]

LIU

C

, LU

J

, LI

G W

, et al. Detecting TensorFlow program bugs in real-world industrial environment[C]∥2021 36th IEEE/ACM International Conference on Automated Software Engineering（ASE）. Piscataway: IEEE Press, 2021: 55-66.

[60]

LU

J

, LI

H F

, LIU

C

, et al. Detecting missing-permission-check vulnerabilities in distributed cloud systems[C]∥Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2022: 2145-2158.

[61]

ZHANG

G M

, LI

Q B

, ZENG

G Y

, et al. Defensing code reuse attacks using live code randomization[J]. Journal of Software, 2019, 30（9）: 2772-2790（in Chinese）.

[62]

KO

Y

, REZK

T

, SERRANO

M

. SecureJS compiler: Portable memory isolation in JavaScript[C]∥Proceedings of the 36th Annual ACM Symposium on Applied Computing. New York: ACM, 2021: 1265-1274.

[63]

MOREAU

T

, CHEN

T Q

, VEGA

L

, et al. A hardware-software blueprint for flexible deep learning specialization[J]. IEEE Micro, 2019, 39（5）: 8-16.

Crossref Google Scholar

[64]

THIERRY

M

, TIANQI

C

, ZIHENG

J

, et al. VTA: An open hardware-software stack for deep learning[DB/OL]. arXiv preprint: 1807.04188, 2018.

Acta Aeronautica et Astronautica Sinica

Volume 45 Issue 20,
October 2024

Article number: 630552

DOI: 10.7527/S1000-6893.2024.30552

Cite this article:

LI G, DU Z, ZHAO J, et al. Compiler technologies for emerging application paradigms and advanced computer architectures. Acta Aeronautica et Astronautica Sinica, 2024, 45(20): 630552. https://doi.org/10.7527/S1000-6893.2024.30552

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号