| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Regular Paper

SinGRAV: Learning a Generative Radiance Volume from a Single Natural Scene

Yu-Jie Wang^{¹^,²^,³}, Xue-Lin Chen^⁴, Bao-Quan Chen^{²^,³}()

1School of Computer Science and Technology, Shandong University, Qingdao 266237, China

2State Key Laboratory of General Artificial Intelligence, Beijing 100871, China

3School of Intelligence Science and Technology, Peking University, Beijing 100871, China

4Tencent AI Lab, Tencent Holdings Limited, Shenzhen 518057, China

Show Author Information

Abstract

We present SinGRAV, an attempt to learn a generative radiance volume from multi-view observations of a single natural scene, in stark contrast to existing category-level 3D generative models that learn from images of many object-centric scenes. Inspired by SinGAN, we also learn the internal distribution of the input scene, which necessitates our key designs w.r.t. the scene representation and network architecture. Unlike popular multi-layer perceptrons (MLP)-based architectures, we particularly employ convolutional generators and discriminators, which inherently possess spatial locality bias, to operate over voxelized volumes for learning the internal distribution over a plethora of overlapping regions. On the other hand, localizing the adversarial generators and discriminators over confined areas with limited receptive fields easily leads to highly implausible geometric structures in the spatial. Our remedy is to use spatial inductive bias and joint discrimination on geometric clues in the form of 2D depth maps. This strategy is effective in improving spatial arrangement while incurring negligible additional computational cost. Experimental results demonstrate the ability of SinGRAV in generating plausible and diverse variations from a single scene, the merits of SinGRAV over state-of-the-art generative neural scene models, and the versatility of SinGRAV by its use in a variety of applications. Code and data will be released to facilitate further research.

Keywords

generative model neural radiance field 3D scene generation

Electronic Supplementary Material

Download File(s)

JCST-2307-13596-Highlights.pdf (270.9 KB)

References

[1]

Chan E R, Monteiro M, Kellnhofer P, Wu J J, Wetzstein G. pi-GAN: Periodic implicit generative adversarial networks for 3D-aware image synthesis. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.5795–5805. DOI: 10.1109/CVPR46437.2021.00574.

[2]

Chan E R, Lin C Z, Chan M A, Nagano K, Pan B X, de Mello S, Gallo O, Guibas L, Tremblay J, Khamis S, Karras T, Wetzstein G. Efficient geometry-aware 3D generative adversarial networks. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.16102–16112. DOI: 10.1109/CVPR52688.2022.01565.

[3]

Gu J T, Liu L J, Wang P, Theobalt C. StyleNeRF: A style-based 3D aware generator for high-resolution image synthesis. In Proc. the 10th International Conference on Learning Representations, Apr. 2022.

[4]

Schwarz K, Liao Y Y, Niemeyer M, Geiger A. GRAF: Generative radiance fields for 3D-aware image synthesis. In Proc. the 34th International Conference on Neural Information Processing Systems, Dec. 2020, Article No. 1692, pp.20154–20166. DOI: 10.5555/3495724.3497416.

[5]

Niemeyer M, Geiger A. GIRAFFE: Representing scenes as compositional generative neural feature fields. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.11448–11459. DOI: 10.1109/CVPR46437.2021.01129.

[6]

Shaham T R, Dekel T, Michaeli T. SinGAN: Learning a generative model from a single natural image. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 2019, pp.4569–4579. DOI: 10.1109/ICCV.2019.00467.

[7]

Shocher A, Bagon S, Isola P, Irani M. InGAN: Capturing and retargeting the “DNA” of a natural image. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 2019, pp.4491–4500. DOI: 10.1109/ICCV.2019.00459.

[8]

Ding X H, Chen H H, Zhang X Y, Han J G, Ding G G. RepMLPNet: Hierarchical vision MLP with re-parameterized locality. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.568–577. DOI: 10.1109/CVPR52688.2022.00066.

[9]

Chen Z Q, Zhang H. Learning implicit fields for generative shape modeling. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.5932–5941. DOI: 10.1109/CVPR.2019.00609.

[10]

Park J J, Florence P, Straub J, Newcombe R, Lovegrove S. DeepSDF: Learning continuous signed distance functions for shape representation. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.165–174. DOI: 10.1109/CVPR.2019.00025.

[11]

Michalkiewicz M, Pontes J K, Jack D, Baktashmotlagh M, Eriksson A. Implicit surface representations as layers in neural networks. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.4742–4751. DOI: 10.1109/ICCV.2019.00484.

[12]

Takikawa T, Litalien J, Yin K X, Kreis K, Loop C, Nowrouzezahrai D, Jacobson A, McGuire M, Fidler S. Neural geometric level of detail: Real-time rendering with implicit 3D shapes. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.11353–11362. DOI: 10.1109/CVPR46437.2021.01120.

[13]

Martel J N P, Lindell D B, Lin C Z, Chan E R, Monteiro M, Wetzstein G. Acorn: Adaptive coordinate networks for neural scene representation. ACM Trans. Graphics, 2021, 40(4): Article No. 58. DOI: 10.1145/3450626.3459785.

Crossref Google Scholar

[14]

Nguyen-Phuoc T, Li C, Theis L, Richardt C, Yang Y L. HoloGAN: Unsupervised learning of 3D representations from natural images. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 2019, pp.7587–7596. DOI: 10.1109/ICCV.2019.00768.

[15]

Mildenhall B, Srinivasan P P, Tancik M, Barron J T, Ramamoorthi R, Ng R. NeRF: Representing scenes as neural radiance fields for view synthesis. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.405–421. DOI: 10.1007/978-3-030-58452-8_24.

[16]

Wiles O, Gkioxari G, Szeliski R, Johnson J. SynSin: End-to-end view synthesis from a single image. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.7465–7475. DOI: 10.1109/CVPR42600.2020.00749.

[17]

Nguyen-Phuoc T, Richardt C, Mai L, Yang Y L, Mitra N. BlockGAN: Learning 3D object-aware scene representations from unlabelled images. In Proc. the 34th Conference on Neural Information Processing Systems, Dec. 2020, pp.6767–6778.

[18]

DeVries T, Bautista M A, Srivastava N, Taylor G W, Susskind J M. Unconstrained scene generation with locally conditioned radiance fields. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.14284–14293. DOI: 10.1109/ICCV48922.2021.01404.

[19]

Wang W Y, Xu Q G, Ceylan D, Mech R, Neumann U. DISN: Deep implicit surface network for high-quality single-view 3D reconstruction. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, Article No. 45. DOI: 10.5555/3454287.3454332.

[20]

Sitzmann V, Thies J, Heide F, Nießner M, Wetzstein G, Zollhöfer M. DeepVoxels: Learning persistent 3D feature embeddings. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.2432–2441. DOI: 10.1109/CVPR.2019.00254.

[21]

Thies J, Zollhöfer M, Nießner M. Deferred neural rendering: Image synthesis using neural textures. ACM Trans. Graphics, 2019, 38(4): Article No. 66. DOI: 10.1145/3306346.3323035.

Crossref Google Scholar

[22]

Liu L J, Gu J T, Lin K Z, Chua T S, Theobalt C. Neural sparse voxel fields. In Proc. the 34th Conference on Neural Information Processing Systems, Dec. 2020, pp.15651–15663.

[23]

Rebain D, Jiang W, Yazdani S, Li K, Yi K M, Tagliasacchi A. DeRF: Decomposed radiance fields. arXiv: 2011.12490, 2020. https://doi.org/10.48550/arXiv.2011.12490, Mar. 2024.

[24]

Zhang K, Riegler G, Snavely N, Koltun V. NeRF++: Analyzing and improving neural radiance fields. arXiv: 2010.07492, 2020. https://doi.org/10.48550/arXiv.2010.07492, Mar. 2024.

[25]

Lindell D B, Martel J N P, Wetzstein G. AutoInt: Automatic integration for fast neural volume rendering. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.14551–14560. DOI: 10.1109/CVPR46437.2021.01432.

[26]

Wizadwongsa S, Phongthawee P, Yenphraphai J, Suwajanakorn S. Nex: Real-time view synthesis with neural basis expansion. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.8530–8539. DOI: 10.1109/CVPR46437.2021.00843.

[27]

Martin-Brualla R, Radwan N, Sajjadi M S M, Barron J T, Dosovitskiy A, Duckworth D. NeRF in the Wild: Neural radiance fields for unconstrained photo collections. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.7206–7215. DOI: 10.1109/CVPR46437.2021.00713.

[28]

Lin C H, Ma W C, Torralba A, Lucey S. BARF: Bundle-adjusting neural radiance fields. arXiv: 2104.06405, 2021. https://doi.org/10.48550/arXiv.2104.06405, Mar. 2024.

[29]

Wang Z R, Wu S Z, Xie W D, Chen M, Prisacariu V A. NeRF–: Neural radiance fields without known camera parameters. arXiv:2102.07064, 2022. https://doi.org/10.48550/arXiv.2102.07064, Mar. 2024.

[30]

Lombardi S, Simon T, Schwartz G, Zollhoefer M, Sheikh Y, Saragih J. Mixture of volumetric primitives for efficient neural rendering. arXiv: 2103.01954, 2021. https://doi.org/10.48550/arXiv.2103.01954, Mar. 2024.

[31]

Karnewar A, Wang O, Ritschel T, Mitra N J. 3inGAN: Learning a 3D generative model from images of a self-similar scene. In Proc. the 2022 International Conference on 3D Vision, Sept. 2022, pp.342–352. DOI: 10.1109/3DV57658.2022.00046.

[32]

Xu R, Wang X T, Chen K, Zhou B L, Loy C C. Positional encoding as spatial inductive bias in GANs. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.13564–13573. DOI: 10.1109/CVPR46437.2021.01336.

[33]

Son M J, Park J J, Guibas L, Wetzstein G. SinGRAF: Learning a 3D generative radiance field for a single scene. In Proc. the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2023, pp.8507–8517. DOI: 10.1109/CVPR52729.2023.00822.

[34]

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.2672–2680. DOI: 10.5555/2969033.2969125.

[35]

Karras T, Aila T, Laine S, Lehtinen J. Progressive growing of GANs for improved quality, stability, and variation. arXiv:1710.10196, 2017. https://doi.org/10.48550/arXiv.1710.10196, Mar. 2024.

[36]

Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis. In Proc. the 7th International Conference on Learning Representations, May 2019.

[37]

Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.4396–4405. DOI: 10.1109/CVPR.2019.00453.

[38]

Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T. Analyzing and improving the image quality of styleGAN. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.8107–8116. DOI: 10.1109/CVPR42600.2020.00813.

[39]

Karras T, Aittala M, Laine S, Härkönen E, Hellsten J, Lehtinen J, Aila T. Alias-free generative adversarial networks. In Proc. the 35th Conference on Neural Information Processing Systems, Dec. 2021, pp.852–863.

[40]

Dhariwal P, Nichol A. Diffusion models beat GANs on image synthesis. In Proc. the 35th Conference on Neural Information Processing Systems, Dec. 2021, pp.8780–8794.

[41]

He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: 10.1109/CVPR.2016.90.

[42]

Heitz E, Vanhoey K, Chambon T, Belcour L. A sliced Wasserstein loss for neural texture synthesis. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.9407–9415. DOI: 10.1109/CVPR46437.2021.00929.

[43]

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd International Conference on Learning Representations, May 2015.

[44]

Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A. Improved training of Wasserstein GANs. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.5769–5779. DOI: 10.5555/3295222.3295327.

[45]

Wang P, Liu L J, Liu Y, Theobalt C, Komura T, Wang W P. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In Proc. the 35th Conference on Neural Information Processing Systems, Dec. 2021, pp.27171–27183.

Journal of Computer Science and Technology

Volume 39 Issue 2,
March 2024

Pages 305-319

DOI: 10.1007/s11390-023-3596-9

Cite this article:

Wang Y-J, Chen X-L, Chen B-Q. SinGRAV: Learning a Generative Radiance Volume from a Single Natural Scene. Journal of Computer Science and Technology, 2024, 39(2): 305-319. https://doi.org/10.1007/s11390-023-3596-9

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号