PDF (1.5 MB)
Collect
Submit Manuscript
Show Outline
Outline
Abstract
Keywords
References
Show full outline
Hide outline
Publishing Language: Chinese

Compiler technologies for emerging application paradigms and advanced computer architectures

Guangli LI1,2Zhen DU1,2Jiacheng ZHAO1,2Ying LIU1,2Feng YU1,2Yijin LI1,2Zhongcheng ZHANG1,2Huimin CUI1,2()
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
University of Chinese Academy of Sciences, Beijing 100049, China
Show Author Information

Abstract

With the increasing demand for computility driven by emerging applications such as artificial intelligence, the compilation technology, serving as a crucial bridge between software and hardware, is facing unprecedented challenges and opportunities. This article focuses on the development trends of domain-specific compilers, and gives an in-depth discussion on the compilation techniques tailored for emerging domains. By examining various aspects including whole-program operator fusion, dynamic-shape tensor compilation, co-design of software and hardware, computational security, this article provides a comprehensive summary and evaluation of representative domain-specific compilation technologies for new application paradigms and architectures. The key role of domain-specific compilation technologies in adapting to diverse computing platforms, improving program execution efficiency, ensuring software security and supporting hardware design are analyzed. Its prospects for applications and future work are also discussed.

CLC number: V11;TP314 Document code: A Article ID: 1000-6893(2024)20-630552-18

References

[1]
LI M Y, LI D M, ZHANG J W, et al. Research on predictive maintenance technology of space equipment under the background of big data[J]. China New Telecommunications, 2023, 252): 25-28(in Chinese).
[2]
WU XH, ZHU JF, GE W. Robust optimization design of route network intervals[C]∥2011 National Doctoral Academic Forum, 2011(in Chinese).
[3]
ZHOU Z J. Application of artificial intelligence technology in air traffic management[J]. China New Telecommunications, 20165): 51(in Chinese).
[4]
ZHANG X M, YU Z W, YANG Y Q. Application and consideration to boeing 787 influenced by artificial intelligence[J]. Industrial Engineering and Management, 2017, 226): 169-174(in Chinese).
[5]
LI Y B, XU X W, ZHAO J J, et al. Construction of war game training platform based on OODA model[C]∥Proceedings of the 10th Chinese Command and Control Society Conference, 2022(in Chinese).
[6]
LI P Y. Deleting dead code in link time and translating machine code based on pattern matching[D]. Hefei: University of Science and Technology of China, 2015(in Chinese).
[7]
ACHARYA A, BONDHUGULA U, COHEN A. Effective loop fusion in polyhedral compilation using fusion conflict graphs[J]. ACM Transactions on Architecture and Code Optimization, 2020, 174): 1-26.
[8]
XIA J, DAI H D, YANG X J. A linear expressing based approach for optimizing locality using non-singular loop transformations[J]. Chinese Journal of Computers, 2003, 2612): 1609-1620(in Chinese).
[9]
PENG C, LIU Q Z, CHEN C B. Loop permutation and auto-tuning under polyhedral model[J]. Computer Engineering & Science, 2023, 4512): 2121-2134(in Chinese).
[10]
LOZANO R C, CARLSSON M, BLINDELL G H, et al. Combinatorial register allocation and instruction scheduling[J]. ACM Transactions on Programming Languages and Systems, 2019, 413): 1-53.
[11]
ZHANG J C, LIAN R Q, ZHANG Z Q. Register allocation on network processors with multiple register banks[J]. Chinese Journal of Computers, 2006, 291): 66-72(in Chinese).
[12]
GAO M, ZHAO J C, CUI H M, et al. Bitwidth-aware register binding algorithm[J]. Journal of Software, 2024, 356): 2631-2647(in Chinese).
[13]
GAO W, ZHAO R C, HAN L, et al. Research on SIMD auto-vectorization compiling optimization[J]. Journal of Software, 2015, 266): 1265-1284(in Chinese).
[14]
LI C J, HUANG J J, XU Y, et al. Evaluation and analysis of effects of auto-vectorization in typical compilers[J]. Computer Science, 2013, 404): 41-46(in Chinese).
[15]
FENG J G, HE Y P, TAO Q M. Auto-vectorization: Recent development and prospect[J]. Journal on Communications, 2022, 433): 180-195(in Chinese).
[16]
MIDKIFF S P. Automatic parallelization: An overview of fundamental compiler techniques[M]. Cham: Springer International Publishing, 2012.
[17]
MA C Y, LV B X, YE X J, et al. Automatic parallelization framework for complex nested loops based on LLVM pass[J]. Journal of Software, 2023, 347): 3022-3042(in Chinese).
[18]
LI M Z, LIU Y, LIU X Y, et al. The deep learning compiler: A comprehensive survey[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 323): 708-727.
[19]
CHEN T Q, MOREAU T, JIANG Z H, et al. TVM: An automated end-to-end optimizing compiler for deep learning[DB/OL]. arXiv preprint: 1802.04799, 2018.
[20]
LATTNER C, AMINI M, BONDHUGULA U, et al. MLIR: Scaling compiler infrastructure for domain specific computation[C]∥2021 IEEE/ACM International Symposium on Code Generation and Optimization(CGO). Piscataway: IEEE Press, 2021: 2-14.
[21]
CHEN C, QI F. Review on development of convolutional neural network and its application in computer vision[J]. Computer Science, 2019, 463): 63-73(in Chinese).
[22]
VOULODIMOS A, DOULAMIS N, DOULAMIS A, et al. Deep learning for computer vision: A brief review[J]. Computational Intelligence and Neuroscience, 2018, 2018: 7068349.
[23]
ZHAO J S, SONG M X, GAO X. Review on the development and application of natural language processing[J]. Information Technology and Informatization, 20197): 142-145(in Chinese).
[24]
DEGUANG C, JINLIN M A, ZIPING M A, et al. Review of pre-training techniques for natural language processing[J]. Journal of Frontiers of Computer Science & Technology, 2021, 158): 1359.
[25]
WU Y X, LIANG K, LIU Y, et al. The progress and trends of FPGA-based accelerators in deep learning[J]. Chinese Journal of Computers, 2019, 4211): 2461-2480(in Chinese).
[26]
CHEN Y J, CHEN T S, XU Z W, et al. DianNao family[J]. Communications of the ACM, 2016, 5911): 105-112.
[27]
LU W Z, ZHANG F, HE Y X, et al. Evaluation and optimization for Huawei ascend neural network accelerator[J]. Chinese Journal of Computers, 2022, 458): 1618-1637(in Chinese).
[28]
PASZKE A, GROSS S, MASSA F, et al. PyTorch: An imperative style, high-performance deep learning library[DB/OL]. arXiv preprint: 1912.01703, 2019.
[29]
ABADI M, BARHAM P, CHEN J M, et al. TensorFlow: A system for large-scale machine learning[DB/OL]. arXiv preprint: 1605.08695, 2016.
[30]
STAUNSTRUP J, WOLF W. Hardware/software co-design: Principles and practice[M]. Berlin: Springer Science & Business Media, 2013.
[31]
NIU W, GUAN J X, WANG Y Z, et al. DNNFusion: Accelerating deep neural networks execution with advanced operator fusion[C]∥Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. New York: ACM, 2021: 883-898.
[32]
ZHENG Z, YANG X D, ZHAO P Z, et al. AStitch: Enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures[C]∥Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2022: 359-373.
[33]
LI A, ZHENG B J, PEKHIMENKO G, et al. Automatic horizontal fusion for GPU kernels[C]∥2022 IEEE/ACM International Symposium on Code Generation and Optimization(CGO). Piscataway: IEEE Press, 2022: 14-27.
[34]
MA L, XIE Z, YANG Z, et al. Rammer: Enabling holistic deep learning compiler optimizations with rTasks[C]∥14th USENIX Symposium on Operating Systems Design and Implementation(OSDI 20), 2020: 881-897.
[35]
ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8697-8710.
[36]
XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2017: 5987-5995.
[37]
ZHAO J, GAO X, XIA R, et al. Apollo: Automatic partition-based operator fusion through layer by layer optimization[J]. Proceedings of Machine Learning and Systems, 2022, 4: 1-19.
[38]
LI Y, ZHAO J, QIANQI S, et al. SIRIUS: Harvesting whole-program optimization opportunities for DNNs[J]. Proceedings of Machine Learning and Systems, 2023, 5: 1-17.
[39]
DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[DB/OL]. arXiv preprint: 1810.04805, 2018.
[40]
WU S, XU Y, ZHAO D N. Survey of object detection based on deep convolutional network[J]. Pattern Recognition and Artificial Intelligence, 2018, 314): 335-346(in Chinese).
[41]
ZHENG Z, PAN Z F, WANG D L, et al. BladeDISC: Optimizing dynamic shape machine learning workloads via compiler approach[J]. Proceedings of the ACM on Management of Data, 2023, 13): 1-29.
[42]
ZHENG B, JIANG Z, YU C H, et al. DietCode: Automatic optimization for dynamic tensor programs[J]. Proceedings of Machine Learning and Systems, 2022, 4: 848-863.
[43]
YU F, LI G L, ZHAO J C, et al. Optimizing dynamic-shape neural networks on accelerators via on-the-fly micro-kernel polymerization[C]∥Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. New York: ACM, 2024: 1-16.
[44]
BEDOUKIAN P, ADIT N, PEGUERO E, et al. Software-defined vector processing on manycore fabrics[C]∥MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. New York: ACM, 2021: 392-406.
[45]
KRASHINSKY R, BATTEN C, HAMPTON M, et al. The vector-thread architecture[C]∥Proceedings of 31st Annual International Symposium on Computer Architecture. Piscataway: IEEE Press, 2004: 52-63.
[46]
LEE Y, AVIZIENIS R, BISHARA A, et al. Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators[C]∥Proceedings of the 38th annual international symposium on Computer architecture. New York: ACM, 2011: 129-140.
[47]
PARK Y, PARK J J K, PARK H, et al. Libra: Tailoring SIMD execution using heterogeneous hardware and dynamic configurability[C]∥2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. Piscataway: IEEE Press, 2012: 84-95.
[48]
ZHANG Z C, OU Y, LIU Y, et al. Occamy: Elastically sharing a SIMD co-processor across multiple CPU cores[C]∥Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. New York: ACM, 2023: 483-497.
[49]
DENG Y J, WANG C X, YU S C, et al. StrongBox: A GPU TEE on arm endpoints[C]∥Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2022: 769-783.
[50]
VOLOS S, VASWANI K, BRUNO R. Graviton: Trusted execution environments on GPUs[C]∥13th USENIX Symposium on Operating Systems Design and Imple-mentation(OSDI 18), 2018: 681-696.
[51]
JANG I, TANG A, KIM T, et al. Heterogeneous isolated execution for commodity GPUs[C]∥Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2019: 455-468.
[52]
JIANG J Y, QI J, SHEN T X, et al. CRONUS: Fault-isolated, secure and high-performance heterogeneous computing for trusted execution environment[C]∥2022 55th IEEE/ACM International Symposium on Microarchitecture(MICRO). Piscataway: IEEE Press, 2022: 124-143.
[53]
MAI H, ZHAO J, ZHENG H, et al. Honeycomb: Se-cure and efficient GPU executions via static Valida-tion[C]∥17th USENIX Symposium on Operating Sys-tems Design and Implementation(OSDI 23). 2023: 155-172.
[54]
LAI Q K, F, HE C L, et al. An ideal performance oriented approach for cross-framework compiler analysis[J]. Journal of Computer Research and Development, 2021, 583): 668-680(in Chinese).
[55]
XING J R, WANG L Y, ZHANG S, et al. Bolt: Bridging the gap between auto-tuners and hardware-native performance[DB/OL]. arXiv preprint: 2110.15238, 2021.
[56]
FENG S Y, HOU B H, JIN H Y, et al. TensorIR: An abstraction for automatic tensorized program optimization[C]∥Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. New York: ACM, 2023: 804-817.
[57]
YU F, ZHAO J C, CUI H M, et al. VTensor: Using virtual tensors to build a layout-oblivious AI programming framework[J]. Journal of Computer Science and Technology, 2023, 385): 1074-1097.
[58]
HUANG G Y, BAI Y, LIU L, et al. ALCOP: Automatic load-compute pipelining in deep learning compiler for AI-GPUs[DB/OL]. arXiv preprint: 2210.16691, 2022.
[59]
LIU C, LU J, LI G W, et al. Detecting TensorFlow program bugs in real-world industrial environment[C]∥2021 36th IEEE/ACM International Conference on Automated Software Engineering(ASE). Piscataway: IEEE Press, 2021: 55-66.
[60]
LU J, LI H F, LIU C, et al. Detecting missing-permission-check vulnerabilities in distributed cloud systems[C]∥Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2022: 2145-2158.
[61]
ZHANG G M, LI Q B, ZENG G Y, et al. Defensing code reuse attacks using live code randomization[J]. Journal of Software, 2019, 309): 2772-2790(in Chinese).
[62]
KO Y, REZK T, SERRANO M. SecureJS compiler: Portable memory isolation in JavaScript[C]∥Proceedings of the 36th Annual ACM Symposium on Applied Computing. New York: ACM, 2021: 1265-1274.
[63]
MOREAU T, CHEN T Q, VEGA L, et al. A hardware-software blueprint for flexible deep learning specialization[J]. IEEE Micro, 2019, 395): 8-16.
[64]
THIERRY M, TIANQI C, ZIHENG J, et al. VTA: An open hardware-software stack for deep learning[DB/OL]. arXiv preprint: 1807.04188, 2018.
Acta Aeronautica et Astronautica Sinica
Article number: 630552
Cite this article:
LI G, DU Z, ZHAO J, et al. Compiler technologies for emerging application paradigms and advanced computer architectures. Acta Aeronautica et Astronautica Sinica, 2024, 45(20): 630552. https://doi.org/10.7527/S1000-6893.2024.30552
Metrics & Citations  
Article History
Copyright
Return