Reinvent Cloud Software Stacks for Resource Disaggregation

Chen-Xi Wang; Yi-Zhou Shan; Peng-Fei Zuo; Hui-Min Cui

doi:10.1007/s11390-023-3272-0

| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Survey

Reinvent Cloud Software Stacks for Resource Disaggregation

Chen-Xi Wang^{¹^,²}, Yi-Zhou Shan^³, Peng-Fei Zuo^³, Hui-Min Cui^{¹^,²}()

1Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

2University of the Chinese Academy of Sciences, Beijing 101408, China

3Huawei Cloud, Shenzhen 518129, China

Show Author Information

Abstract

Due to the unprecedented development of low-latency interconnect technology, building large-scale disaggregated architecture is drawing more and more attention from both industry and academia. Resource disaggregation is a new way to organize the hardware resources of datacenters, and has the potential to overcome the limitations, e.g., low resource utilization and low reliability, of conventional datacenters. However, the emerging disaggregated architecture brings severe performance and latency problems to the existing cloud systems. In this paper, we take memory disaggregation as an example to demonstrate the unique challenges that the disaggregated datacenter poses to the existing cloud software stacks, e.g., programming interface, language runtime, and operating system, and further discuss the possible ways to reinvent the cloud systems.

Keywords

cloud computing resource disaggregation datacenter program semantics

Electronic Supplementary Material

Download File(s)

JCST-2304-13272-Highlights.pdf (521.6 KB)

References

[1]

Gao P X, Narayan A, Karandikar S, Carreira J, Han S, Agarwal R, Ratnasamy S, Shenker S. Network requirements for resource disaggregation. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.249–264.

[2]

Shan Y Z, Huang Y T, Chen Y L, Zhang Y Y. LegoOS: A disseminated, distributed OS for hardware resource disaggregation. In Proc. the 13th USENIX Conference on Operating Systems Design and Implementation, Oct. 2018, pp.69–87.

[3]

Wang C X, Ma H R, Liu S, Qiao Y F, Eyolfson J, Navasca C, Lu S, Xu G H. MemLiner: Lining up tracing and application for a far-memory-friendly runtime. In Proc. the 16th USENIX Symposium on Operating Systems Design and Implementation, July 2022, pp.35–53.

[4]

Wang C X, Ma H R, Liu S, Li Y Q, Ruan Z Y, Nguyen K, Bond M D, Netravali R, Kim M, Xu G H. Semeru: A memory-disaggregated managed runtime. In Proc. the 14th USENIX Symposium on Operating Systems Design and Implementation, Nov. 2020, pp.261–280.

[5]

Qiao Y F, Wang C X, Ruan Z Y, Belay A, Lu Q D, Zhang Y Y, Kim M, Xu G H. Hermit: Low-latency, high-throughput, and transparent remote memory via feedback-directed asynchrony. In Proc. the 20th USENIX Symposium on Networked Systems Design and Implementation, Apr. 2023, pp.181–198.

[6]

Gouk D, Lee S, Kwon M, Jung M. Direct access, high-performance memory disaggregation with DirectCXL. In Proc. the 2022 USENIX Annual Technical Conference, July 2022, pp.287–294.

[7]

Barroso L, Marty M, Patterson D, Ranganathan P. Attack of the killer microseconds. Communications of the ACM , 2017, 60(4): 48–54. DOI: 10.1145/3015146.

Crossref Google Scholar

[8]

Li H C, Berger D S, Hsu L, Ernst D, Zardoshti P, Novakovic S, Shah M, Rajadnya S, Lee S, Agarwal I, Hill M D, Fontoura M, Bianchini R. Pond: CXL-based memory pooling systems for cloud platforms. In Proc. the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Jan. 2023, pp.574–587. DOI: 10.1145/3575693.3578835.