Denoising diffusion models have demonstrated tremendous success in modeling data distributions and synthesizing high-quality samples. In the 2D image domain, they have become the state-of-the-art and are capable of generating photo-realistic images with high controllability. More recently, researchers have begun to explore how to utilize diffusion models to generate 3D data, as doing so has more potential in real-world applications. This requires careful design choices in two key ways: identifying a suitable 3D representation and determining how to apply the diffusion process. In this survey, we provide the first comprehensive review of diffusion models for manipulating 3D content, including 3D generation, reconstruction, and 3D-aware image synthesis. We classify existing methods into three major categories: 2D space diffusion with pretrained models, 2D space diffusion without pretrained models, and 3D space diffusion. We also summarize popular datasets used for 3D generation with diffusion models. Along with this survey, we maintain a repository https://github.com/cwchenwang/awesome-3d-diffusion to track the latest relevant papers and codebases. Finally, we pose current challenges for diffusion models for 3D generation, and suggest future research directions.
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Communications of the ACM Vol. 63, No. 11, 139–144, 2020.
Xie, Y.; Takikawa, T.; Saito, S.; Litany, O.; Yan, S.; Khan, N.; Tombari, F.; Tompkin, J.; Sitzmann, V.; Sridhar, S. Neural fields in visual computing and beyond. Computer Graphics Forum Vol. 41, No. 2, 641–676, 2022.
Müller, T.; Evans, A.; Schied, C.; Keller, A. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics Vol. 41, No. 4, Article No. 102, 2022.
Kerbl, B.; Kopanas, G.; Leimkuehler, T.; Drettakis, G. 3D Gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics Vol. 42, No. 4, Article No. 139, 2023.
Zhang, J.; Li, X.; Wan, Z.; Wang, C.; Liao, J. Text2NeRF: Text-driven 3D scene generation with neural radiance fields. IEEE Transactions on Visualization and Computer Graphics Vol. 30, No. 12, 7749–7762, 2024.
Li, Y.; Dou, Y.; Shi, Y.; Lei, Y.; Chen, X.; Zhang, Y.; Zhou, P.; Ni, B. FocalDreamer: Text-driven 3D editing via focal-fusion assembly. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 38, No. 4, 3279–3287, 2024.
Laine, S.; Hellsten, J.; Karras, T.; Seol, Y.; Lehtinen, J.; Aila, T. Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics Vol. 39, No. 6, Article No. 194, 2020.
Zhang, B.; Tang, J.; Nießner, M.; Wonka, P. 3DShape2VecSet: A 3D shape representation for neural fields and generative diffusion models. ACM Transactions on Graphics Vol. 42, No. 4, Article No. 92, 2023.
Hu, J.; Hui, K. H.; Liu, Z.; Li, R.; Fu, C. W. Neural wavelet-domain diffusion for 3D shape generation, inversion, and manipulation. ACM Transactions on Graphics Vol. 43, No. 2, Article No. 16, 2024.
Zheng, X. Y.; Pan, H.; Wang, P. S.; Tong, X.; Liu, Y.; Shum, H. Y. Locally attentional SDF diffusion for controllable 3D shape generation. ACM Transactions on Graphics Vol. 42, No. 4, Article No. 91, 2023.