Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
Cache performance is a critical design constraint for modern many-core systems. Since the cache often works in a "black-box" manner, it is difficult for the software to reason about the cache behavior to match the running software to the underlying hardware. To better support code optimization, we need to understand and characterize the cache behavior. While cache performance characterization is heavily studied on traditional
Mantovani F, Garcia-Gasulla M, Gracia J, Stafford E, Banchelli F, Josep-Fabrego M, Criado-Ledesma J, Nachtmann M. Performance and energy consumption of HPC workloads on a cluster based on Arm ThunderX2 CPU. Future Gener. Comput. Syst. , 2020, 112: 800–818. DOI: 10.1016/j.future.2020.06.033.
Hill M D, Marty M R. Amdahl’s law in the multicore era. IEEE Computer , 2008, 41(7): 33–38. DOI: 10.1109/MC.2008.209.
McCalpin J D. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture Newsletter , 1995, 2: 19–25.
Fang J, Liao X, Huang C, Dong D. Performance evaluation of memory-centric ARMv8 many-core architectures: A case study with Phytium 2000+. Journal of Computer Science and Technology , 2021, 36(1): 33–43. DOI: 10.1007/s11390-020-0741-6.
Xia J, Cheng C, Zhou X, Hu Y, Chun P. Kunpeng 920: The first 7-nm chiplet-based 64-Core ARM SoC for cloud services. IEEE Micro , 2021, 41(5): 67–75. DOI: 10.1109/MM.2021.3085578.
Mei X, Chu X. Dissecting GPU memory hierarchy through microbenchmarking. IEEE Transactions on Parallel and Distributed Systems , 2017, 28(1): 72–86. DOI: 10.1109/TPDS.2016.2549523.
Lin J, Xu Z, Cai L, Nukada A, Matsuoka S. Evaluating the SW26010 many-core processor with a micro-benchmark suite for performance optimizations. Parallel Computing , 2018, 77: 128–143. DOI: 10.1016/j.parco.2018.06.001.
McIntosh-Smith S, Price J, Deakin T, Poenaru A. A performance analysis of the first generation of HPC-optimized Arm processors. Concurrency and Computation: Practice and Experience , 2019, 31(16): e5110. DOI: 10.1002/cpe.5110.