AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

SinGRAV: Learning a Generative Radiance Volume from a Single Natural Scene

School of Computer Science and Technology, Shandong University, Qingdao 266237, China
State Key Laboratory of General Artificial Intelligence, Beijing 100871, China
School of Intelligence Science and Technology, Peking University, Beijing 100871, China
Tencent AI Lab, Tencent Holdings Limited, Shenzhen 518057, China
Show Author Information

Abstract

We present SinGRAV, an attempt to learn a generative radiance volume from multi-view observations of a single natural scene, in stark contrast to existing category-level 3D generative models that learn from images of many object-centric scenes. Inspired by SinGAN, we also learn the internal distribution of the input scene, which necessitates our key designs w.r.t. the scene representation and network architecture. Unlike popular multi-layer perceptrons (MLP)-based architectures, we particularly employ convolutional generators and discriminators, which inherently possess spatial locality bias, to operate over voxelized volumes for learning the internal distribution over a plethora of overlapping regions. On the other hand, localizing the adversarial generators and discriminators over confined areas with limited receptive fields easily leads to highly implausible geometric structures in the spatial. Our remedy is to use spatial inductive bias and joint discrimination on geometric clues in the form of 2D depth maps. This strategy is effective in improving spatial arrangement while incurring negligible additional computational cost. Experimental results demonstrate the ability of SinGRAV in generating plausible and diverse variations from a single scene, the merits of SinGRAV over state-of-the-art generative neural scene models, and the versatility of SinGRAV by its use in a variety of applications. Code and data will be released to facilitate further research.

Electronic Supplementary Material

Download File(s)
JCST-2307-13596-Highlights.pdf (270.9 KB)

References

[1]
Chan E R, Monteiro M, Kellnhofer P, Wu J J, Wetzstein G. pi-GAN: Periodic implicit generative adversarial networks for 3D-aware image synthesis. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.5795–5805. DOI: 10.1109/CVPR46437.2021.00574.
[2]
Chan E R, Lin C Z, Chan M A, Nagano K, Pan B X, de Mello S, Gallo O, Guibas L, Tremblay J, Khamis S, Karras T, Wetzstein G. Efficient geometry-aware 3D generative adversarial networks. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.16102–16112. DOI: 10.1109/CVPR52688.2022.01565.
[3]
Gu J T, Liu L J, Wang P, Theobalt C. StyleNeRF: A style-based 3D aware generator for high-resolution image synthesis. In Proc. the 10th International Conference on Learning Representations, Apr. 2022.
[4]
Schwarz K, Liao Y Y, Niemeyer M, Geiger A. GRAF: Generative radiance fields for 3D-aware image synthesis. In Proc. the 34th International Conference on Neural Information Processing Systems, Dec. 2020, Article No. 1692, pp.20154–20166. DOI: 10.5555/3495724.3497416.
[5]
Niemeyer M, Geiger A. GIRAFFE: Representing scenes as compositional generative neural feature fields. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.11448–11459. DOI: 10.1109/CVPR46437.2021.01129.
[6]
Shaham T R, Dekel T, Michaeli T. SinGAN: Learning a generative model from a single natural image. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 2019, pp.4569–4579. DOI: 10.1109/ICCV.2019.00467.
[7]
Shocher A, Bagon S, Isola P, Irani M. InGAN: Capturing and retargeting the “DNA” of a natural image. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 2019, pp.4491–4500. DOI: 10.1109/ICCV.2019.00459.
[8]
Ding X H, Chen H H, Zhang X Y, Han J G, Ding G G. RepMLPNet: Hierarchical vision MLP with re-parameterized locality. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.568–577. DOI: 10.1109/CVPR52688.2022.00066.
[9]
Chen Z Q, Zhang H. Learning implicit fields for generative shape modeling. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.5932–5941. DOI: 10.1109/CVPR.2019.00609.
[10]
Park J J, Florence P, Straub J, Newcombe R, Lovegrove S. DeepSDF: Learning continuous signed distance functions for shape representation. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.165–174. DOI: 10.1109/CVPR.2019.00025.
[11]
Michalkiewicz M, Pontes J K, Jack D, Baktashmotlagh M, Eriksson A. Implicit surface representations as layers in neural networks. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.4742–4751. DOI: 10.1109/ICCV.2019.00484.
[12]
Takikawa T, Litalien J, Yin K X, Kreis K, Loop C, Nowrouzezahrai D, Jacobson A, McGuire M, Fidler S. Neural geometric level of detail: Real-time rendering with implicit 3D shapes. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.11353–11362. DOI: 10.1109/CVPR46437.2021.01120.
[13]

Martel J N P, Lindell D B, Lin C Z, Chan E R, Monteiro M, Wetzstein G. Acorn: Adaptive coordinate networks for neural scene representation. ACM Trans. Graphics, 2021, 40(4): Article No. 58. DOI: 10.1145/3450626.3459785.

[14]
Nguyen-Phuoc T, Li C, Theis L, Richardt C, Yang Y L. HoloGAN: Unsupervised learning of 3D representations from natural images. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 2019, pp.7587–7596. DOI: 10.1109/ICCV.2019.00768.
[15]
Mildenhall B, Srinivasan P P, Tancik M, Barron J T, Ramamoorthi R, Ng R. NeRF: Representing scenes as neural radiance fields for view synthesis. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.405–421. DOI: 10.1007/978-3-030-58452-8_24.
[16]
Wiles O, Gkioxari G, Szeliski R, Johnson J. SynSin: End-to-end view synthesis from a single image. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.7465–7475. DOI: 10.1109/CVPR42600.2020.00749.
[17]
Nguyen-Phuoc T, Richardt C, Mai L, Yang Y L, Mitra N. BlockGAN: Learning 3D object-aware scene representations from unlabelled images. In Proc. the 34th Conference on Neural Information Processing Systems, Dec. 2020, pp.6767–6778.
[18]
DeVries T, Bautista M A, Srivastava N, Taylor G W, Susskind J M. Unconstrained scene generation with locally conditioned radiance fields. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.14284–14293. DOI: 10.1109/ICCV48922.2021.01404.
[19]
Wang W Y, Xu Q G, Ceylan D, Mech R, Neumann U. DISN: Deep implicit surface network for high-quality single-view 3D reconstruction. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, Article No. 45. DOI: 10.5555/3454287.3454332.
[20]
Sitzmann V, Thies J, Heide F, Nießner M, Wetzstein G, Zollhöfer M. DeepVoxels: Learning persistent 3D feature embeddings. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.2432–2441. DOI: 10.1109/CVPR.2019.00254.
[21]

Thies J, Zollhöfer M, Nießner M. Deferred neural rendering: Image synthesis using neural textures. ACM Trans. Graphics, 2019, 38(4): Article No. 66. DOI: 10.1145/3306346.3323035.

[22]
Liu L J, Gu J T, Lin K Z, Chua T S, Theobalt C. Neural sparse voxel fields. In Proc. the 34th Conference on Neural Information Processing Systems, Dec. 2020, pp.15651–15663.
[23]
Rebain D, Jiang W, Yazdani S, Li K, Yi K M, Tagliasacchi A. DeRF: Decomposed radiance fields. arXiv: 2011.12490, 2020. https://doi.org/10.48550/arXiv.2011.12490, Mar. 2024.
[24]
Zhang K, Riegler G, Snavely N, Koltun V. NeRF++: Analyzing and improving neural radiance fields. arXiv: 2010.07492, 2020. https://doi.org/10.48550/arXiv.2010.07492, Mar. 2024.
[25]
Lindell D B, Martel J N P, Wetzstein G. AutoInt: Automatic integration for fast neural volume rendering. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.14551–14560. DOI: 10.1109/CVPR46437.2021.01432.
[26]
Wizadwongsa S, Phongthawee P, Yenphraphai J, Suwajanakorn S. Nex: Real-time view synthesis with neural basis expansion. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.8530–8539. DOI: 10.1109/CVPR46437.2021.00843.
[27]
Martin-Brualla R, Radwan N, Sajjadi M S M, Barron J T, Dosovitskiy A, Duckworth D. NeRF in the Wild: Neural radiance fields for unconstrained photo collections. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.7206–7215. DOI: 10.1109/CVPR46437.2021.00713.
[28]
Lin C H, Ma W C, Torralba A, Lucey S. BARF: Bundle-adjusting neural radiance fields. arXiv: 2104.06405, 2021. https://doi.org/10.48550/arXiv.2104.06405, Mar. 2024.
[29]
Wang Z R, Wu S Z, Xie W D, Chen M, Prisacariu V A. NeRF–: Neural radiance fields without known camera parameters. arXiv:2102.07064, 2022. https://doi.org/10.48550/arXiv.2102.07064, Mar. 2024.
[30]
Lombardi S, Simon T, Schwartz G, Zollhoefer M, Sheikh Y, Saragih J. Mixture of volumetric primitives for efficient neural rendering. arXiv: 2103.01954, 2021. https://doi.org/10.48550/arXiv.2103.01954, Mar. 2024.
[31]
Karnewar A, Wang O, Ritschel T, Mitra N J. 3inGAN: Learning a 3D generative model from images of a self-similar scene. In Proc. the 2022 International Conference on 3D Vision, Sept. 2022, pp.342–352. DOI: 10.1109/3DV57658.2022.00046.
[32]
Xu R, Wang X T, Chen K, Zhou B L, Loy C C. Positional encoding as spatial inductive bias in GANs. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.13564–13573. DOI: 10.1109/CVPR46437.2021.01336.
[33]
Son M J, Park J J, Guibas L, Wetzstein G. SinGRAF: Learning a 3D generative radiance field for a single scene. In Proc. the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2023, pp.8507–8517. DOI: 10.1109/CVPR52729.2023.00822.
[34]
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.2672–2680. DOI: 10.5555/2969033.2969125.
[35]
Karras T, Aila T, Laine S, Lehtinen J. Progressive growing of GANs for improved quality, stability, and variation. arXiv:1710.10196, 2017. https://doi.org/10.48550/arXiv.1710.10196, Mar. 2024.
[36]
Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis. In Proc. the 7th International Conference on Learning Representations, May 2019.
[37]
Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.4396–4405. DOI: 10.1109/CVPR.2019.00453.
[38]
Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T. Analyzing and improving the image quality of styleGAN. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.8107–8116. DOI: 10.1109/CVPR42600.2020.00813.
[39]
Karras T, Aittala M, Laine S, Härkönen E, Hellsten J, Lehtinen J, Aila T. Alias-free generative adversarial networks. In Proc. the 35th Conference on Neural Information Processing Systems, Dec. 2021, pp.852–863.
[40]
Dhariwal P, Nichol A. Diffusion models beat GANs on image synthesis. In Proc. the 35th Conference on Neural Information Processing Systems, Dec. 2021, pp.8780–8794.
[41]
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: 10.1109/CVPR.2016.90.
[42]
Heitz E, Vanhoey K, Chambon T, Belcour L. A sliced Wasserstein loss for neural texture synthesis. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.9407–9415. DOI: 10.1109/CVPR46437.2021.00929.
[43]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd International Conference on Learning Representations, May 2015.
[44]
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A. Improved training of Wasserstein GANs. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.5769–5779. DOI: 10.5555/3295222.3295327.
[45]
Wang P, Liu L J, Liu Y, Theobalt C, Komura T, Wang W P. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In Proc. the 35th Conference on Neural Information Processing Systems, Dec. 2021, pp.27171–27183.
Journal of Computer Science and Technology
Pages 305-319
Cite this article:
Wang Y-J, Chen X-L, Chen B-Q. SinGRAV: Learning a Generative Radiance Volume from a Single Natural Scene. Journal of Computer Science and Technology, 2024, 39(2): 305-319. https://doi.org/10.1007/s11390-023-3596-9

185

Views

0

Crossref

0

Web of Science

1

Scopus

0

CSCD

Altmetrics

Received: 14 July 2023
Accepted: 12 January 2024
Published: 30 March 2024
© Institute of Computing Technology, Chinese Academy of Sciences 2024
Return