| Sign up

PDF (11.9 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Review Article | Open Access

Diffusion models for 3D generation: A survey

Chen Wang^¹, Hao-Yang Peng^², Ying-Tian Liu^², Jiatao Gu^³, Shi-Min Hu^²()

1

Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA

2

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

3

Machine Learning Research, Apple AI/ML, New York, USA

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

Denoising diffusion models have demonstrated tremendous success in modeling data distributions and synthesizing high-quality samples. In the 2D image domain, they have become the state-of-the-art and are capable of generating photo-realistic images with high controllability. More recently, researchers have begun to explore how to utilize diffusion models to generate 3D data, as doing so has more potential in real-world applications. This requires careful design choices in two key ways: identifying a suitable 3D representation and determining how to apply the diffusion process. In this survey, we provide the first comprehensive review of diffusion models for manipulating 3D content, including 3D generation, reconstruction, and 3D-aware image synthesis. We classify existing methods into three major categories: 2D space diffusion with pretrained models, 2D space diffusion without pretrained models, and 3D space diffusion. We also summarize popular datasets used for 3D generation with diffusion models. Along with this survey, we maintain a repository https://github.com/cwchenwang/awesome-3d-diffusion to track the latest relevant papers and codebases. Finally, we pose current challenges for diffusion models for 3D generation, and suggest future research directions.

Keywords

diffusion models 3D generation generative models AIGC

References

[1]

Kingma, D. P.; Welling, M. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.

[2]

Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Communications of the ACM Vol. 63, No. 11, 139–144, 2020.

Crossref Google Scholar

[3]

Rezende, D. J.; Mohamed, S. Variational inference with normalizing flows. In: Proceedings of the 32nd International Conference on Machine Learning, Vol. 2, 1530–1538, 2015.

[4]

Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, 6840–6851, 2020.

[5]

Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10674–10685, 2022.

[6]

Saharia, C.; Chan, W.; Saxena, S.; Lit, L.; Whang, J.; Denton, E.; Ghasemipour, S. K. S.; Ayan, B. K.; Mahdavi, S. S.; Gontijo-Lopes, R.; et al. Photorealistic text-to-image diffusion models with deep language understanding. In: Proceedings of the 36th International Conference on Neural Information Processing Systems, 36479–36494, 2022.

[7]

Mildenhall, B.; Srinivasan, P. P.; Tancik, M.; Barron, J. T.; Ramamoorthi, R.; Ng, R. NeRF: Representing scenes as neural radiance fields for view synthesis. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12346. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 405–421, 2020.

[8]

Yang, L.; Zhang, Z.; Song, Y.; Hong, S.; Xu, R.; Zhao, Y.; Zhang, W.; Cui, B.; Yang, M. H. Diffusion models: A comprehensive survey of methods and applications. arXiv preprint arXiv:2209.00796, 2022.

[9]

Zhang, C.; Zhang, C.; Zhang, M.; Kweon, I. S. Text-to-image diffusion models in generative AI: A survey. arXiv preprint arXiv:2303.07909, 2023.

[10]

Li, C.; Zhang, C.; Waghwase, A.; Lee, L. H.; Rameau, F.; Yang, Y.; Bae, S. H.; Hong, C. S. Generative AI meets 3D: A survey on text-to-3D in AIGC era. arXiv preprint arXiv:2305.06131, 2023.

[11]

Shi, Z.; Peng, S.; Xu, Y.; Geiger, A.; Liao, Y.; Shen, Y. Deep generative models on 3D representations: A survey. arXiv preprint arXiv:2210.15663, 2022.

[12]

Sohl-Dickstein, J.; Weiss, E. A.; Maheswaranathan, N.; Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In: Proceedings of the 32nd International Conference on Machine Learning, 2256–2265, 2015.

[13]

Song, Y.; Sohl-Dickstein, J.; Kingma, D. P.; Kumar, A.; Ermon, S.; Poole, B. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.

[14]

Xie, Y.; Takikawa, T.; Saito, S.; Litany, O.; Yan, S.; Khan, N.; Tombari, F.; Tompkin, J.; Sitzmann, V.; Sridhar, S. Neural fields in visual computing and beyond. Computer Graphics Forum Vol. 41, No. 2, 641–676, 2022.

Crossref Google Scholar

[15]

Chen, A.; Xu, Z.; Wei, X.; Tang, S.; Su, H.; Geiger, A. Factor fields: A unified framework for neural fields and beyond. arXiv preprint arXiv:2302.01226, 2023.

[16]

Müller, T.; Evans, A.; Schied, C.; Keller, A. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics Vol. 41, No. 4, Article No. 102, 2022.

Crossref Google Scholar

[17]

Chan, E. R.; Lin, C. Z.; Chan, M. A.; Nagano, K.; Pan, B.; de Mello, S.; Gallo, O.; Guibas, L.; Tremblay, J.; Khamis, S.; et al. Efficient geometry-aware 3D generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16102–16112, 2022.

[18]

Fridovich-Keil, S.; Yu, A.; Tancik, M.; Chen, Q.; Recht, B.; Kanazawa, A. Plenoxels: Radiance fields without neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5491–5500, 2022.

[19]

Kerbl, B.; Kopanas, G.; Leimkuehler, T.; Drettakis, G. 3D Gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics Vol. 42, No. 4, Article No. 139, 2023.

Crossref Google Scholar

[20]

Poole, B.; Jain, A.; Barron, J. T.; Mildenhall, B. DreamFusion: Text-to-3D using 2D diffusion. In: Proceedings of the International Conference on Learning Representations, 2022.

[21]

Wang, H.; Du, X.; Li, J.; Yeh, R. A.; Shakhnarovich, G. Score Jacobian chaining: Lifting pretrained 2D diffusion models for 3D generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12619–12629, 2023.

[22]

Liang, Y.; Yang, X.; Lin, J.; Li, H.; Xu, X.; Chen, Y. LucidDreamer: Towards high-fidelity text-to-3D generation via interval score matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6517–6526, 2024.

[23]

Wang, Z.; Lu, C.; Wang, Y.; Bao, F.; Li, C.; Su, H.; Zhu, J. ProlificDreamer: High-fidelity and diverse text-to-3D generation with variational score distillation. In: Proceedings of the 37th International Conference on Neural Information Processing Systems, Article No. 368, 8406–8441, 2023.

[24]

Barron, J. T.; Mildenhall, B.; Tancik, M.; Hedman, P.; Martin-Brualla, R.; Srinivasan, P. P. Mip-NeRF: A multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5835–5844, 2021.

[25]

Tang, J.; Ren, J.; Zhou, H.; Liu, Z.; Zeng, G. DreamGaussian: Generative Gaussian splatting for efficient 3D content creation. In: Proceedings of the 12th International Conference on Learning Representations, 2023.

[26]

Lin, C. H.; Gao, J.; Tang, L.; Takikawa, T.; Zeng, X.; Huang, X.; Kreis, K.; Fidler, S.; Liu, M. Y.; Lin, T. Y. Magic3D: High-resolution text-to-3D content creation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 300–309, 2023.

[27]

Tsalicoglou, C.; Manhardt, F.; Tonioni, A.; Niemeyer, M.; Tombari, F. TextMesh: Generation of realistic 3D meshes from text prompts. In: Proceedings of the International Conference on 3D Vision, 1554–1563, 2024.

[28]

Metzer, G.; Richardson, E.; Patashnik, O.; Giryes, R.; Cohen-Or, D. Latent-NeRF for shape-guided generation of 3D shapes and textures. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12663–12673, 2023.

[29]

Chen, R.; Chen, Y.; Jiao, N.; Jia, K. Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3D content creation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 22189–22199, 2023.

[30]

Seo, H.; Kim, H.; Kim, G.; Chun, S. Y. DITTO-NeRF: Diffusionbased iterative text to omni-directional 3D model. arXiv preprint arXiv:2304.02827, 2023.

[31]

Seo, J.; Jang, W.; Kwak, M. S.; Ko, J.; Kim, H.; Kim, J.; Kim, J. H.; Lee, J.; Kim, S. Let 2D diffusion model know 3D-consistency for robust text-to-3D generation. arXiv preprint arXiv:2303.07937, 2023.

[32]

Liao, T. H.; Ge, S.; Xu, Y.; Lee, Y. C.; AlBahar, B.; Huang, J. B. Textdriven visual synthesis with latent diffusion prior. arXiv preprint arXiv:2302.08510, 2023.

[33]

Armandpour, M.; Zheng, H.; Sadeghian, A.; Sadeghian, A.; Zhou, M. Re-imagine the negative prompt algorithm: Transform 2D diffusion into 3D, alleviate Janus problem and Beyond. arXiv preprint arXiv:2304.04968, 2023.

[34]

Zhu, J.; Zhuang, P. HiFA: High-fidelity text-to-3D with advanced diffusion guidance. arXiv preprint arXiv:2305.18766, 2023.

[35]

Huang, Y.; Wang, J.; Shi, Y.; Qi, X.; Zha, Z. J.; Zhang, L. DreamTime: An improved optimization strategy for text-to-3D content creation. arXiv preprint arXiv:2306.12422, 2023.

[36]

Lorraine, J.; Xie, K.; Zeng, X.; Lin, C. H.; Takikawa, T.; Sharp, N.; Lin, T. Y.; Liu, M. Y.; Fidler, S.; Lucas, J. ATT3D: Amortized text-to-3D object synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 17900–17910, 2023.

[37]

Xie, K.; Lorraine, J.; Cao, T.; Gao, J.; Lucas, J.; Torralba, A.; Fidler, S.; Zeng, X. LATTE3D: Large-scale amortized text-to-enhanced3D synthesis. arXiv preprint arXiv:2403.15385, 2024.

[38]

Fridman, R.; Abecasis, A.; Kasten, Y.; Dekel, T. SceneScape: Text-driven consistent scene generation. In: Proceedings of the 37th International Conference on Neural Information Processing Systems, Article No. 1734, 39897–39914, 2023.

[39]

Höllein, L.; Cao, A.; Owens, A.; Johnson, J.; Nießner, M. Text2Room: Extracting textured 3D meshes from 2D text-to-image models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 7875–7886, 2023.

[40]

Zhang, J.; Li, X.; Wan, Z.; Wang, C.; Liao, J. Text2NeRF: Text-driven 3D scene generation with neural radiance fields. IEEE Transactions on Visualization and Computer Graphics Vol. 30, No. 12, 7749–7762, 2024.

Crossref Google Scholar

[41]

Li, J.; Bansal, M. PANOGEN: text-conditioned panoramic environment generation for vision-and-language navigation. In: Proceedings of the 37th International Conference on Neural Information Processing Systems, Article No. 957, 21878–21894, 2023.

[42]

Tang, S.; Zhang, F.; Chen, J.; Wang, P.; Furukawa, Y. MVDiffusion: Enabling holistic multi-view image generation with correspondence-aware diffusion. arXiv preprint arXiv:2307.01097, 2023.

[43]

Po, R.; Wetzstein, G. Compositional 3D scene generation using locally conditioned diffusion. In: Proceedings of the International Conference on 3D Vision, 651–663, 2024.

[44]

Cohen-Bar, D.; Richardson, E.; Metzer, G.; Giryes, R.; Cohen-Or, D. Set-the-scene: Global-local training for generating controllable NeRF scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2912–2921, 2023.

[45]

Lin, Y.; Bai, H.; Li, S.; Lu, H.; Lin, X.; Xiong, H.; Wang, L. CompoNeRF: Text-guided multi-object compositional NeRF with editable 3D scene layout. arXiv preprint arXiv:2303.13843, 2023.

[46]

Xu, D.; Jiang, Y.; Wang, P.; Fan, Z.; Wang, Y.; Wang, Z. NeuralLift-360: Lifting an in-the-wild 2D photo to A 3D object with 360° views. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4479–4489, 2023.

[47]

Deng, C.; Jiang, C. M.; Qi, C. R.; Yan, X.; Zhou, Y.; Guibas, L.; Anguelov, D. NeRDi: Single-view NeRF synthesis with language-guided diffusion as general image priors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20637–20647, 2023.

[48]

Melas-Kyriazi, L.; Laina, I.; Rupprecht, C.; Vedaldi, A. RealFusion 360° reconstruction of any object from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8446–8455, 2023.

[49]

Gal, R.; Alaluf, Y.; Atzmon, Y.; Patashnik, O.; Bermano, A. H.; Chechik, G.; Cohen-Or, D. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.

[50]

Tang, J.; Wang, T.; Zhang, B.; Zhang, T.; Yi, R.; Ma, L.; Chen, D. Make-it-3D: High-fidelity 3D creation from A single image with diffusion prior. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 22762–22772, 2023.

[51]

Yoo, P.; Guo, J.; Matsuo, Y.; Gu, S. S. DreamSparse: Escaping from plato’s cave with 2D diffusion model given sparse views. In: Proceedings of the 37th International Conference on Neural Information Processing Systems, Article No. 146, 3307–3324, 2023.

[52]

Raj, A.; Kaza, S.; Poole, B.; Niemeyer, M.; Ruiz, N.; Mildenhall, B.; Zada, S.; Aberman, K.; Rubinstein, M.; Barron, J.; et al. DreamBooth3D: Subject-driven text-to-3D generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2349–2359, 2023.

[53]

Liu, R.; Wu, R.; Van Hoorick, B.; Tokmakov, P.; Zakharov, S.; Vondrick, C. Zero-1-to-3: Zero-shot one image to 3D object. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 9264–9275, 2023.

[54]

Qian, G.; Mai, J.; Hamdi, A.; Ren, J.; Siarohin, A.; Li, B.; Lee, H. Y.; Skorokhodov, I.; Wonka, P.; Tulyakov, S.; et al. Magic123: One image to high-quality 3D object generation using both 2D and 3D diffusion priors. In: Proceedings of the 12th International Conference on Learning Representations, 2023.

[55]

Liu, M.; Xu, C.; Jin, H.; Chen, L.; MukundVarma, T.; Xu, Z.; Su, H. One-2-3-45: Any single image to 3D mesh in 45 seconds without per-shape optimization. In: Proceedings of the 37th International Conference on Neural Information Processing Systems, Article No. 976, 22226–22246, 2023.

[56]

Shi, Y.; Wang, P.; Ye, J.; Long, M.; Li, K.; Yang, X. Mvdream: Multiview diffusion for 3D generation. In: Proceedings of the 12th International Conference on Learning Representations, 2023.

[57]

Liu, Y.; Lin, C.; Zeng, Z.; Long, X.; Liu, L.; Komura, T.; Wang, W. SyncDreamer: Generating multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023.

[58]

Shi, R.; Chen, H.; Zhang, Z.; Liu, M.; Xu, C.; Wei, X.; Chen, L.; Zeng, C.; Su, H. Zero123++: A single image to consistent multi-view diffusion base model. arXiv preprint arXiv:2310.15110, 2023.

[59]

Long, X.; Guo, Y. C.; Lin, C.; Liu, Y.; Dou, Z.; Liu, L.; Ma, Y.; Zhang, S. H.; Habermann, M.; Theobalt, C.; et al. Wonder3D: Single image to 3D using cross-domain diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9970–9980, 2024.

[60]

Li, J.; Tan, H.; Zhang, K.; Xu, Z.; Luan, F.; Xu, Y.; Hong, Y.; Sunkavalli, K.; Shakhnarovich, G.; Bi, S. Instant3D: Fast text-to-3D with sparse-view generation and large reconstruction model. In: Proceedings of the 12th International Conference on Learning Representations, 2023.

[61]

Wang, P.; Shi, Y. ImageDream: Image-prompt multi-view diffusion for 3D generation. arXiv preprint arXiv:2312.02201, 2023.

[62]

Tang, J.; Chen, Z.; Chen, X.; Wang, T.; Zeng, G.; Liu, Z. LGM: Large multi-view Gaussian model for high-resolution 3D content creation. arXiv preprint arXiv:2402.05054, 2024.

[63]

Xu, Y.; Shi, Z.; Wang, Y.; Chen, H.; Yang, C.; Peng, S.; Shen, Y.; Wetzstein, G. GRM: Large Gaussian reconstruction model for efficient 3D reconstruction and generation. arXiv preprint arXiv:2403.14621, 2024.

[64]

Xu, J.; Cheng, W.; Gao, Y.; Wang, X.; Gao, S.; Shan, Y. InstantMesh: Efficient 3D mesh generation from a single image with sparse-view large reconstruction models. arXiv preprint arXiv:2404.07191, 2024.

[65]

Wang, C.; Gu, J.; Long, X.; Liu, Y.; Liu, L. GECO: Generative image-to-3D within a SECOnd. arXiv preprint arXiv:2405.20327, 2024.

[66]

Gao, R.; Holynski, A.; Henzler, P.; Brussee, A.; Martin-Brualla, R.; Srinivasan, P.; Barron, J. T.; Poole, B. CAT3D: Create anything in 3D with multi-view diffusion models. arXiv preprint arXiv:2405.10314, 2024.

[67]

Zhang, L.; Qiu, Q.; Lin, H.; Zhang, Q.; Shi, C.; Yang, W.; Shi, Y.; Yang, S.; Xu, L.; Yu, J. DreamFace: Progressive generation of animatable 3D faces under text guidance. arXiv preprint arXiv:2304.03117, 2023.

[68]

Li, R.; Bladin, K.; Zhao, Y.; Chinara, C.; Ingraham, O.; Xiang, P.; Ren, X.; Prasad, P.; Kishore, B.; Xing, J.; et al. Learning formation of physically-based face attributes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3407–3416, 2020.

[69]

Jiang, R.; Wang, C.; Zhang, J.; Chai, M.; He, M.; Chen, D.; Liao, J. AvatarCraft: Transforming text into neural human avatars with parameterized shape and pose control. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 14325–14336, 2023.

[70]

Cao, Y.; Cao, Y. P.; Han, K.; Shan, Y.; Wong, K. Y K. DreamAvatar: Text-and-shape guided 3D human avatar generation via diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 958–968, 2024.

[71]

Huang, Y.; Wang, J.; Zeng, A.; Cao, H.; Qi, X.; Shi, Y.; Zha, Z. J.; Zhang, L. DreamWaltz: Make a scene with complex 3D animatable avatars. In: Proceedings of the 37th International Conference on Neural Information Processing Systems, Article No. 202, 4566–4584, 2023.

[72]

Liao, T.; Yi, H.; Xiu, Y.; Tang, J.; Huang, Y.; Thies, J.; Black, M. J. TADA! text to animatable digital avatars. In: Proceedings of the International Conference on 3D Vision, 1508–1519, 2024.

[73]

Kolotouros, N.; Alldieck, T.; Zanfir, A.; Bazavan, E.; Fieraru, M.; Sminchisescu, C. DreamHuman: Animatable 3D avatars from text. In: Proceedings of the 37th International Conference on Neural Information Processing Systems, Article No. 462, 10516–1052, 2023.

[74]

Weng, Z.; Wang, Z.; Yeung, S. ZeroAvatar: Zero-shot 3D avatar generation from a single image. arXiv preprint arXiv:2305.16411, 2023.

[75]

Zeng, Y.; Lu, Y.; Ji, X.; Yao, Y.; Zhu, H.; Cao, X. AvatarBooth: High-quality and customizable 3D human avatar generation. arXiv preprint arXiv:2306.09864, 2023.

[76]

Jakab, T.; Li, R.; Wu, S.; Rupprecht, C.; Vedaldi, A. Farm3D: Learning articulated 3D animals by distilling 2D diffusion. In: Proceedings of the International Conference on 3D Vision, 852–861, 2024.

[77]

Yao, C.; Raj, A.; Hung, W. C.; Li, Y.; Rubinstein, M.; Yang, M.; Jampani, V. ARTIC3D: Learning robust articulated 3D shapes from noisy web image collections. In: Proceedings of the 37th International Conference on Neural Information Processing Systems, Article No. 2090, 48173–4818, 2023.

[78]

Haque, A.; Tancik, M.; Efros, A. A.; Holynski, A.; Kanazawa, A. Instruct-NeRF₂NeRF: Editing 3D scenes with instructions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 19683–19693, 2023.

[79]

Kamata, H.; Sakuma, Y.; Hayakawa, A.; Ishii, M.; Narihira, T. Instruct 3D-to-3D: Text instruction guided 3D-to-3D conversion. arXiv preprint arXiv:2303.15780, 2023.

[80]

Brooks, T.; Holynski, A.; Efros, A. A. InstructPix2Pix: Learning to follow image editing instructions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18392–18402, 2023.

[81]

Yu, L.; Xiang, W.; Han, K. Edit-DiffNeRF: Editing 3D neural radiance fields using 2D diffusion model. arXiv preprint arXiv:2306.09551, 2023.

[82]

Shao, R.; Sun, J.; Peng, C.; Zheng, Z.; Zhou, B.; Zhang, H.; Liu, Y. Control4D: Efficient 4D portrait editing with text. arXiv preprint arXiv:2305.20082, 2023.

[83]

Zhou, X.; He, Y.; Yu, F. R.; Li, J.; Li, Y. RePaint-NeRF: NeRF editting via semantic masks and diffusion models. arXiv preprint arXiv:2306.05668, 2023.

[84]

Zhuang, J.; Wang, C.; Lin, L.; Liu, L.; Li, G. DreamEditor: Text-driven 3D scene editing with neural fields. In: Proceedings of the SIGGRAPH Asia Conference Papers, Article No. 26, 2023.

[85]

Li, Y.; Dou, Y.; Shi, Y.; Lei, Y.; Chen, X.; Zhang, Y.; Zhou, P.; Ni, B. FocalDreamer: Text-driven 3D editing via focal-fusion assembly. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 38, No. 4, 3279–3287, 2024.

Crossref Google Scholar

[86]

Guo, Y. C.; Liu, Y. T.; Wang, C.; Zou, Z. X.; Luo, G.; Chen, C. H.; Cao, Y. P.; Zhang, S. H. threestudio: A unified framework for 3D content generation. 2023. Available at https://github.com/threestudio-project/threestudio

[87]

Watson, D.; Chan, W.; Martin-Brualla, R.; Ho, J.; Tagliasacchi, A.; Norouzi, M. Novel view synthesis with diffusion models. arXiv preprint arXiv:2210.04628, 2022.

[88]

Chan, E. R.; Nagano, K.; Chan, M. A.; Bergman, A. W.; Park, J. J.; Levy, A.; Aittala, M.; De Mello, S.; Karras, T.; Wetzstein, G. Generative novel view synthesis with 3D-aware diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 4194–4206, 2023.

[89]

Gu, J.; Trevithick, A.; Lin, K. E.; Susskind, J.; Theobalt, C.; Liu, L.; Ramamoorthi, R. NerfDiff: Single-image view synthesis with NeRF-guided distillation from 3D-aware diffusion. arXiv preprint arXiv:2302.10109, 2023.

[90]

Yu, A.; Ye, V.; Tancik, M.; Kanazawa, A. pixelNeRF: Neural radiance fields from one or few images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4576–4585, 2021.

[91]

Li, G.; Zheng, H.; Wang, C.; Li, C.; Zheng, C.; Tao, D. 3DDesigner: Towards photorealistic 3D object generation and editing with text-guided diffusion models. arXiv preprint arXiv:2211.14108, 2022.

[92]

Zhou, Z.; Tulsiani, S. SparseFusion: Distilling view-conditioned diffusion for 3D reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12588–12597, 2023.

[93]

Anciukevičius, T.; Xu, Z.; Fisher, M.; Henderson, P.; Bilen, H.; Mitra, N. J.; Guerrero, P. RenderDiffusion: Image diffusion for 3D reconstruction, inpainting and generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12608–12618, 2023.

[94]

Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. Navab, N.; Hornegger, J.; Wells, W.; Frangi, A. Eds. Springer Cham, 234–241, 2015.

[95]

Tewari, A.; Yin, T.; Cazenavette, G.; Rezchikov, S.; Tenenbaum, J.; Durand, F.; Freeman, B.; Sitzmann, V. Diffusion with forward models: Solving stochastic inverse problems without direct supervision. In: Proceedings of the 37th International Conference on Neural Information Processing Systems, Article No. 542, 12349–12362, 2023.

[96]

Szymanowicz, S.; Rupprecht, C.; Vedaldi, A. Viewset diffusion: (0-)image-conditioned 3D generative models from 2D data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 8829–8839, 2023.

[97]

Xu, Y.; Tan, H.; Luan, F.; Bi, S.; Wang, P.; Li, J.; Shi, Z.; Sunkavalli, K.; Wetzstein, G.; Xu, Z.; et al. DMV3D: Denoising multi-view diffusion using 3D large reconstruction model. In: Proceedings of the 12th International Conference on Learning Representations, 2023.

[98]

Hong, Y.; Zhang, K.; Gu, J.; Bi, S.; Zhou, Y.; Liu, D.; Liu, F.; Sunkavalli, K.; Bui, T.; Tan, H. LRM: Large reconstruction model for single image to 3D. In: Proceedings of the 12th International Conference on Learning Representations, 2023.

[99]

Xiang, J.; Yang, J.; Huang, B.; Tong, X. 3D-aware image generation using 2D diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2383–2393, 2023.

[100]

Deng, J.; Dong, W.; Socher, R.; Li, L. J.; Kai, L.; Li, F. F. ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248–255, 2009.

[101]

Tseng, H. Y.; Li, Q.; Kim, C.; Alsisan, S.; Huang, J. B.; Kopf, J. Consistent view synthesis with pose-guided diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16773–16783, 2023.

[102]

Yu, J. J.; Forghani, F.; Derpanis, K. G.; Brubaker, M. A. Long-term photometric consistent novel view synthesis with diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 7071–7081, 2023.

[103]

Cai, S.; Chan, E. R.; Peng, S.; Shahbazi, M.; Obukhov, A.; Van Gool, L.; Wetzstein, G. DiffDreamer: Towards consistent unsupervised single-view scene extrapolation with conditional diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2139–2150, 2023.

[104]

Shue, J. R.; Chan, E. R.; Po, R.; Ankner, Z.; Wu, J.; Wetzstein, G. 3D neural field generation using triplane diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20875–20886, 2023.

[105]

Wang, T.; Zhang, B.; Zhang, T.; Gu, S.; Bao, J.; Baltrusaitis, T.; Shen, J.; Chen, D.; Wen, F.; Chen, Q.; et al. RODIN: A generative model for sculpting 3D digital avatars using diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4563–4573, 2023.

[106]

Gupta, A.; Xiong, W.; Nie, Y.; Jones, I.; Oğuz, B. 3DGen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371, 2023.

[107]

Shen, T.; Gao, J.; Yin, K.; Liu, M. Y.; Fidler, S. Deep marching tetrahedra: A hybrid representation for high-resolution 3D shape synthesis. In: Proceedings of the 35th International Conference on Neural Information Processing Systems, Article No. 466, 6087–6101, 2021.

[108]

Laine, S.; Hellsten, J.; Karras, T.; Seol, Y.; Lehtinen, J.; Aila, T. Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics Vol. 39, No. 6, Article No. 194, 2020.

Crossref Google Scholar

[109]

Deitke, M.; Schwenk, D.; Salvador, J.; Weihs, L.; Michel, O.; VanderBilt, E.; Schmidt, L.; Ehsanit, K.; Kembhavi, A.; Farhadi, A. Objaverse: A universe of annotated 3D objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13142–13153, 2023.

[110]

Chen, H.; Gu, J.; Chen, A.; Tian, W.; Tu, Z.; Liu, L.; Su, H. Single-stage diffusion NeRF: A unified approach to 3D generation and reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2416–2425, 2023.

[111]

Gu, J.; Gao, Q.; Zhai, S.; Chen, B.; Liu, L.; Susskind, J. Control3Diff: Learning controllable 3D diffusion models from single-view images. In: Proceedings of the International Conference on 3D Vision, 685–696, 2024.

[112]

Chou, G.; Bahat, Y.; Heide, F. Diffusion-SDF: Conditional generative modeling of signed distance functions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2262–2272, 2023.

[113]

Li, M.; Duan, Y.; Zhou, J.; Lu, J. Diffusion-SDF: Text-to-shape via voxelized diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12642–12651, 2023.

[114]

Nam, G.; Khlifi, M.; Rodriguez, A.; Tono, A.; Zhou, L.; Guerrero, P. 3D-LDM: Neural implicit 3D shape generation with latent diffusion models. arXiv preprint arXiv:2212.00842, 2022.

[115]

Cheng, Y. C.; Lee, H. Y.; Tulyakov, S.; Schwing, A.; Gui, L. SDFusion: Multimodal 3D shape completion, reconstruction, and generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4456–4465, 2023.

[116]

Zeng, X.; Vahdat, A.; Williams, F.; Gojcic, Z.; Litany, O.; Fidler, S.; Kreis, K. LION: Latent point diffusion models for 3D shape generation. In: Proceedings of the 36th International Conference on Neural Information Processing Systems, Article No. 728, 10021–10039, 2022.

[117]

Liu, Z.; Tang, H.; Lin, Y.; Han, S. Point-voxel CNN for efficient 3D deep learning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No. 87,965–975, 2019.

[118]

Zhang, B.; Tang, J.; Nießner, M.; Wonka, P. 3DShape2VecSet: A 3D shape representation for neural fields and generative diffusion models. ACM Transactions on Graphics Vol. 42, No. 4, Article No. 92, 2023.

Crossref Google Scholar

[119]

Karras, T.; Aittala, M.; Aila, T.; Laine, S. Elucidating the design space of diffusion-based generative models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems, Article No. 1926, 26565–26577, 2022.

[120]

Jun, H.; Nichol, A. Shap-E: Generating conditional 3D implicit functions. arXiv preprint arXiv:2305.02463, 2023.

[121]

Bautista, M. A.; Guo, P.; Abnar, S.; Talbott, W.; Toshev, A.; Chen, Z.; Dinh, L.; Zhai, S.; Goh, H.; Ulbricht, D.; et al. GAUDI: A neural architect for immersive 3D scene generation. In: Proceedings of the 36th International Conference on Neural Information Processing Systems, Article No. 1820, 25102–25116, 2022.

[122]

Kim, S. W.; Brown, B.; Yin, K.; Kreis, K.; Schwarz, K.; Li, D.; Rombach, R.; Torralba, A.; Fidler, S. NeuralField-LDM: Scene generation with hierarchical latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8496–8506, 2023.

[123]

Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4396–4405, 2019.

[124]

Zhang, C.; Chen, Y.; Fu, Y.; Zhou, Z.; Yu, G.; Wang, B.; Fu, B.; Chen, T.; Lin, G.; Shen, C. StyleAvatar3D: Leveraging image-text diffusion models for high-fidelity 3D avatar generation. arXiv preprint arXiv:2305.19012, 2023.

[125]

Zhang, L.; Rao, A.; Agrawala, M. Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 3813–3824, 2023.

[126]

Müller, N.; Siddiqui, Y.; Porzi, L.; Bulò, S. R.; Kontschieder, P.; Nießner, M. DiffRF: Rendering-guided 3D radiance field diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4328–4338, 2023.

[127]

Liu, Z.; Feng, Y.; Black, M. J.; Nowrouzezahrai, D.; Paull, L.; Liu, W. MeshDiffusion: Score-based generative 3D mesh modeling. arXiv preprint arXiv:2303.08133, 2023.

[128]

Nichol, A.; Jun, H.; Dhariwal, P.; Mishkin, P.; Chen, M. Point-E: A system for generating 3D point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.

[129]

Lyu, Z.; Wang, J.; An, Y.; Zhang, Y.; Lin, D.; Dai, B. Controllable mesh generation through sparse latent point diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 271–280, 2023.

[130]

Yang, G.; Kundu, A.; Guibas, L. J.; Barron, J. T.; Poole, B. Learning a diffusion prior for NeRFs. arXiv preprint arXiv:2304.14473, 2023.

[131]

Kalischek, N.; Peters, T.; Wegner, J. D.; Schindler, K. TetraDiffusion: Tetrahedral diffusion models for 3D shape generation. arXiv preprint arXiv:2211.13220, 2022.

[132]

Hui, K. H.; Li, R.; Hu, J.; Fu, C. W. Neural wavelet-domain diffusion for 3D shape generation. In: Proceedings of the SIGGRAPH Asia Conference Papers, Article No. 24, 2022.

[133]

Hu, J.; Hui, K. H.; Liu, Z.; Li, R.; Fu, C. W. Neural wavelet-domain diffusion for 3D shape generation, inversion, and manipulation. ACM Transactions on Graphics Vol. 43, No. 2, Article No. 16, 2024.

Crossref Google Scholar

[134]

Zheng, X. Y.; Pan, H.; Wang, P. S.; Tong, X.; Liu, Y.; Shum, H. Y. Locally attentional SDF diffusion for controllable 3D shape generation. ACM Transactions on Graphics Vol. 42, No. 4, Article No. 91, 2023.

Crossref Google Scholar

[135]

Erkoç, Z.; Ma, F.; Shan, Q.; Nießner, M.; Dai, A. HyperDiffusion: Generating implicit neural fields with weight-space diffusion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 14254–14264, 2023.

[136]

Chu, R.; Xie, E.; Mo, S.; Li, Z.; Nießner, M.; Fu, C. W.; Jia, J. Diffcomplete: Diffusion-based generative 3D shape completion. In: Proceedings of the 37th International Conference on Neural Information Processing Systems, Article No. 3318, 75951–75966, 2024.

[137]

Ju, X.; Huang, Z.; Li, Y.; Zhang, G.; Qiao, Y.; Li, H. DiffInDScene: Diffusion-based high-quality 3D indoor scene generation. arXiv preprint arXiv:2306.00519, 2023.

[138]

Sun, J.; Xie, Y.; Chen, L.; Zhou, X.; Bao, H. NeuralRecon: Real-time coherent 3D reconstruction from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15593–15602, 2021.

[139]

Luo, S.; Hu, W. Diffusion probabilistic models for 3D point cloud generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2836–2844, 2021.

[140]

Kong, D.; Wang, Q.; Qi, Y. A diffusion-ReFinement model for Sketch-to-point modeling. In: Computer Vision – ACCV 2022. Lecture Notes in Computer Science, Vol. 13847. Wang, L.; Gall, J.; Chin, T. J.; Sato, I.; Chellappa, R. Eds. Springer Cham, 54–70, 2023.

[141]

Zhou, L.; Du, Y.; Wu, J. 3D shape generation and completion through point-voxel diffusion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5806–5815, 2021.

[142]

Wu, Z.; Wang, Y.; Feng, M.; Xie, H.; Mian, A. Sketch and text guided diffusion model for colored point cloud generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 8895–8905, 2023.

[143]

Qi, C. R.; Yi, L.; Su, H.; Guibas, L. J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413, 2017.

[144]

Karnewar, A.; Vedaldi, A.; Novotny, D.; Mitra, N. J. HOLODIFFUSION: Training a 3D diffusion model using 2D images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18423–18433, 2023.

[145]

Karnewar, A.; Mitra, N. J.; Vedaldi, A.; Novotny, D. HoloFusion: Towards photo-realistic 3D generative modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 22919–22928, 2023.

[146]

Nakayama, G. K.; Angelina Uy, M.; Huang, J.; Hu, S. M.; Li, K.; Guibas, L. DiffFacto: Controllable part-based 3D point cloud generation with cross diffusion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 14211–14221, 2023.

[147]

He, X.; Chen, J.; Peng, S.; Huang, D.; Li, Y.; Huang, X.; Yuan, C.; Ouyang, W.; He, T. GVGEN: Text-to-3D generation with volumetric representation. arXiv preprint arXiv:2403.12957, 2024.

[148]

Zhang, B.; Cheng, Y.; Yang, J.; Wang, C.; Zhao, F.; Tang, Y.; Chen, D.; Guo, B. GaussianCube: A structured and explicit radiance representation for 3D generative modeling. arXiv preprint arXiv:2403.19655, 2024.

[149]

Chang, A. X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, H.; et al. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012, 2015.

[150]

Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D ShapeNets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1912–1920, 2015.

[151]

Eppner, C.; Mousavian, A.; Fox, D. ACRONYM: A large-scale grasp dataset based on simulation. In: Proceedings of the IEEE International Conference on Robotics and Automation, 6222–6227, 2021.

[152]

Selvaraju, P.; Nabail, M.; Loizou, M.; Maslioukova, M.; Averkiou, M.; Andreou, A.; Chaudhuri, S.; Kalogerakis, E. BuildingNet: Learning to label 3D buildings. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10377–10387, 2021.

[153]

Choi, S.; Zhou, Q. Y.; Miller, S.; Koltun, V. A large dataset of object scans. arXiv preprint arXiv:1602.02481, 2016.

[154]

Calli, B.; Singh, A.; Walsman, A.; Srinivasa, S.; Abbeel, P.; Dollar, A. M. The YCB object and model set: Towards common benchmarks for manipulation research. In: Proceedings of the International Conference on Advanced Robotics, 510–517, 2015.

[155]

Park, K.; Rematas, K.; Farhadi, A.; Seitz, S. M. PhotoShape: Photorealistic materials for large-scale shape collections. arXiv preprint arXiv:1809.09761, 2018.

[156]

Collins, J.; Goel, S.; Deng, K.; Luthra, A.; Xu, L.; Gundogdu, E.; Zhang, X.; Vicente, T. F. Y.; Dideriksen, T.; Arora, H.; et al. ABO: Dataset and benchmarks for real-world 3D object understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 21094–21104, 2022.

[157]

Chen, K.; Choy, C. B.; Savva, M.; Chang, A. X.; Funkhouser, T.; Savarese, S. Text2Shape: Generating shapes from natural language by learning joint embeddings. In: Computer Vision – ACCV 2018. Lecture Notes in Computer Science, Vol. 11363. Jawahar, C. V.; Li, H.; Mori, G.; Schindler, K. Eds. Springer Cham, 100–116, 2019.

[158]

Achlioptas, P.; Guibas, L.; Goodman, N.; Fan, J.; Hawkins, R. Shapeglot: Learning language for shape differentiation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 8937–8946, 2019.

[159]

Sun, X.; Wu, J.; Zhang, X.; Zhang, Z.; Zhang, C.; Xue, T.; Tenenbaum, J. B.; Freeman, W. T. Pix3D: Dataset and methods for single-image 3D shape modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2974–2983, 2018.

[160]

Reizenstein, J.; Shapovalov, R.; Henzler, P.; Sbordone, L.; Labatut, P.; Novotny, D. Common objects in 3D: Large-scale learning and evaluation of real-life 3D category reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10881–10891, 2021.

[161]

Yu, X.; Xu, M.; Zhang, Y.; Liu, H.; Ye, C.; Wu, Y.; Yan, Z.; Zhu, C.; Xiong, Z.; Liang, T.; et al. MVImgNet: A large-scale dataset of multi-view images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9150–9161, 2023.

[162]

Chang, A.; Dai, A.; Funkhouser, T.; Halber, M.; Nießner, M.; Savva, M.; Song, S.; Zeng, A.; Zhang, Y. Matterport3D: Learning from RGB-D data in indoor environments. arXiv preprint arXiv:1709.06158, 2017.

[163]

Zhou, T.; Tucker, R.; Flynn, J.; Fyffe, G.; Snavely, N. Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817, 2018.

[164]

Johnson, J.; Hariharan, B.; van der Maaten, L.; Li, F. F.; Zitnick, C. L.; Girshick, R. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1988–1997, 2017.

[165]

Skorokhodov, I.; Sotnikov, G.; Elhoseiny, M. Aligning latent and image spaces to connect the unconnectable. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 14124–14133, 2021.

[166]

Baruch, G.; Chen, Z.; Dehghan, A.; Dimry, T.; Feigin, Y.; Fu, P.; Gebauer, T.; Joffe, B.; Kurz, D.; Schwartz, A.; et al. ARKitScenes: A diverse real-world dataset for 3D indoor scene understanding using mobile RGB-D data. arXiv preprint arXiv:2111.08897, 2021.

[167]

Kempka, M.; Wydmuch, M.; Runc, G.; Toczek, J.; Jaśkowski, W. ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In: Proceedings of the IEEE Conference on Computational Intelligence and Games, 1–8, 2016.

[168]

Straub, J.; Whelan, T.; Ma, L.; Chen, Y.; Wijmans, E.; Green, S.; Engel, J. J.; Mur-Artal, R.; Ren, C.; Verma, S.; et al. The replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.

[169]

Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An open urban driving simulator. In: Proceedings of the 1st Annual Conference on Robot Learning, 1–16, 2017.

[170]

Xue, L.; Yu, N.; Zhang, S.; Panagopoulou, A.; Li, J.; Martín-Martín, R.; Wu, J.; Xiong, C.; Xu, R.; Niebles, J. C.; et al. ULIP-2: Towards scalable multimodal pre-training for 3D understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 27081–27091, 2024.

[171]

Choi, Y.; Uh, Y.; Yoo, J.; Ha, J. W. StarGAN v2: Diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8185–8194, 2020.

[172]

Deitke, M.; Liu, R.; Wallingford, M.; Ngo, H.; Michel, O.; Kusupati, A.; Fan, A.; Laforte, C.; Voleti, V.; Gadre, S. Y.; et al. Objaverse-XL: A universe of 10M+ 3D objects. arXiv preprint arXiv:2307.05663, 2023.

[173]

He, Y.; Bai, Y.; Lin, M.; Zhao, W.; Hu, Y.; Sheng, J.; Yi, R.; Li, J.; Liu, Y. J. T³Bench: Benchmarking current progress in text-to-3D generation. arXiv preprint arXiv:2310.02977, 2023.

Computational Visual Media

Volume 11 Issue 1,
February 2025

Pages 1-28

DOI: 10.26599/CVM.2025.9450452

Cite this article:

Wang C, Peng H-Y, Liu Y-T, et al. Diffusion models for 3D generation: A survey. Computational Visual Media, 2025, 11(1): 1-28. https://doi.org/10.26599/CVM.2025.9450452

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号