Learning layout generation for virtual worlds

Weihao Cheng; Ying Shan

doi:10.1007/s41095-023-0365-1

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (6.5 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research Article | Open Access

Learning layout generation for virtual worlds

Weihao Cheng^¹(

), Ying Shan^¹

1ARC Lab, Tencent, Shenzhen 518057, China

Show Author Information

Graphical Abstract

Abstract

The emergence of the metaverse has led to the rapidly increasing demand for the generation of extensive 3D worlds. We consider that an engaging world is built upon a rational layout of multiple land-use areas (e.g., forest, meadow, and farmland). To this end, we propose a generative model of land-use distribution that learns from geographic data. The model is based on a transformer architecture that generates a 2D map of the land-use layout, which can be conditioned on spatial and semantic controls, depending on whether either one or both are provided. This model enables diverse layout generation with user control and layout expansion by extending borders with partial inputs. To generate high-quality and satisfactory layouts, we devise a geometric objective function that supervises the model to perceive layout shapes and regularize generations using geometric priors. Additionally, we devise a planning objective function that supervises the model to perceive progressive composition demands and suppress generations deviating from controls. To evaluate the spatial distribution of the generations, we train an autoencoder to embed land-use layouts into vectors to enable comparison between the real and generated data using the Wasserstein metric, which is inspired by the Fréchet inception distance.

Keywords

layout generation virtual worlds Trans-former land-use

References

[1]

Dionisio, J. D. N.; Burns III, W. G.; Gilbert, R. 3D virtual worlds and the metaverse: Current status and future possibilities. ACM Computing Surveys Vol. 45, No. 3, Article No. 34, 2013.

Crossref Google Scholar

[2]

Li, J. N.; Yang, J. M.; Hertzmann, A.; Zhang, J. M.; Xu, T. F. LayoutGAN: Generating graphic layouts with wireframe discriminators. arXiv preprint arXiv:1901.06767, 2019.

Google Scholar

[3]

Jyothi, A. A.; Durand, T.; He, J. W.; Sigal, L.; Mori, G. LayoutVAE: Stochastic scene layout generation from a label set. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 9894–9903, 2019.

Crossref

[4]

Gupta, K.; Lazarow, J.; Achille, A.; Davis, L.; Mahadevan, V.; Shrivastava, A. LayoutTransformer: Layout generation and completion with self-attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 984–994, 2021.

Crossref

[5]

Arroyo, D. M.; Postels, J.; Tombari, F. Variational transformer networks for layout generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13637–13647, 2021.

Crossref

[6]

Yang, C. F.; Fan, W. C.; Yang, F. E.; Wang, Y. C. F. LayoutTransformer: Scene layout generation with conceptual and spatial diversity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3731–3740, 2021.

Crossref

[7]

Chang, A. X.; Eric, M.; Savva, M.; Manning, C. D. SceneSeer: 3D scene design with natural language. arXiv preprint arXiv:1703.00050, 2017.

Google Scholar

[8]

Li, M. Y.; Patil, A. G.; Xu, K.; Chaudhuri, S.; Khan, O.; Shamir, A.; Tu, C. H.; Chen, B. Q.; Cohen-Or, D.; Zhang, H. GRAINS: Generative recursive autoencoders for INdoor scenes. ACM Transactions on Graphics Vol. 38, No. 2, Article No. 12, 2019.

Crossref Google Scholar

[9]

Wang, K.; Lin, Y.; Weissmann, B.; Savva, M.; Chang, A. X.; Ritchie, D. PlanIT: Planning and instantiating indoor scenes with relation graph and spatial prior networks. ACM Transactions on Graphics Vol. 38, No. 4, Article No. 132, 2019.

Crossref Google Scholar

[10]

Ritchie, D.; Wang, K.; Lin, Y. A. Fast and flexible indoor scene synthesis via deep convolutional generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6175–6183, 2019.

Crossref

[11]

Keshavarzi, M.; Parikh, A.; Zhai, X. Y.; Mao, M.; Caldas, L.; Yang, A. Y. SceneGen: Generative Contextual Scene Augmentation using Scene Graph Priors. arXiv preprint arXiv:2009.12395, 2020.

Google Scholar

[12]

Chen, Q.; Wu, Q.; Tang, R.; Wang, Y. H.; Wang, S.; Tan, M. K. Intelligent home 3D: Automatic 3D-house design from linguistic descriptions only. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12622–12631, 2020.

Crossref

[13]

Luo, A.; Zhang, Z. T.; Wu, J. J.; Tenenbaum, J. B. End-to-end optimization of scene layout. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3753–3762, 2020.

Crossref

[14]

Dhamo, H.; Manhardt, F.; Navab, N.; Tombari, F. Graph-to-3D: End-to-end generation and manipulation of 3D scenes using scene graphs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 16332–16341, 2021.

Crossref

[15]

Deussen, O.; Hanrahan, P.; Lintermann, B.; Měch, R.; Pharr, M.; Prusinkiewicz, P. Realistic modeling and rendering of plant ecosystems. In: Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, 275–286, 1998.

Crossref

[16]

Parish, Y. I. H.; Müller, P. Procedural modeling of cities. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, 301–308, 2001.

Crossref

[17]

Perlin, K. Improving noise. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, 681–682, 2002.

Crossref

[18]

Zhou, H.; Sun, J.; Turk, G.; Rehg, J. M. Terrain synthesis from digital elevation models. IEEE Transactions on Visualization and Computer Graphics Vol. 13, No. 4, 834–848, 2007.

Crossref Google Scholar

[19]

Merrell, P.; Schkufza, E.; Koltun, V. Computer-generated residential building layouts. In: Proceedings of the ACM SIGGRAPH Asia Papers, 1–12, 2010.

Crossref

[20]

Fischer, R.; Dittmann, P.; Weller, R.; Zachmann, G. AutoBiomes: Procedural generation of multi-biome landscapes. The Visual Computer Vol. 36, No. 10, 2263–2272, 2020.

Crossref Google Scholar

[21]

Esser, P.; Rombach, R.; Ommer, B. Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12868–12878, 2021.

Crossref

[22]

Ramesh, A.; Pavlov, M.; Goh, G.; Gray, S.; Voss, C.; Radford, A.; Chen, M.; Sutskever, I. Zero-shot text-to-image generation. arXiv preprint arXiv:2102.12092, 2021.

Google Scholar

[23]

Yu, J. H.; Xu, Y. Z.; Koh, J. Y.; Luong, T.; Baid, G.; Wang, Z. R.; Vasudevan, V.; Ku, A.; Yang, Y. F.; Ayan, B. K.; et al. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2022.

Google Scholar

[24]

Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, 2022.

Google Scholar

[25]

Freiknecht, J.; Effelsberg, W. A survey on the procedural generation of virtual worlds. Multimodal Technologies and Interaction Vol. 1, No. 4, 27, 2017.

Crossref Google Scholar

[26]

Ebert, D. S.; Kenton Musgrave, F.; Peachey, D.; Perlin, K.; Worley, S. Texturing and Modeling: A Procedural Approach. Amsterdam: Elsevier, 2003.

Crossref

[27]

Yu, L. F.; Yeung, S. K.; Tang, C. K.; Terzopoulos, D.; Chan, T. F.; Osher, S. J. Make it home: Automatic optimization of furniture arrangement. ACM Transactions on Graphics Vol. 30, No. 4, Article No. 86, 2011.

Crossref Google Scholar

[28]

Zhou, Y.; While, Z.; Kalogerakis, E. SceneGraphNet: Neural message passing for 3D indoor scene augmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 7383–7391, 2019.

Crossref

[29]

Wang, K.; Savva, M.; Chang, A. X.; Ritchie, D. Deep convolutional priors for indoor scene synthesis. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 70, 2018.

Crossref Google Scholar

[30]

Chang, A.; Savva, M.; Manning, C. D. Learning spatial knowledge for text to 3D scene generation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2028–2038, 2014.

Crossref

[31]

Ma, R.; Patil, A. G.; Fisher, M.; Li, M. Y.; Pirk, S.; Hua, B. S.; Yeung, S. K.; Tong, X.; Guibas, L.; Zhang, H. Language-driven synthesis of 3D scenes from scene databases. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 212, 2018.

Crossref Google Scholar

[32]

Kingma, D. P.; Welling, M. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.

Google Scholar

[33]

Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2, 2672–2680, 2014.

[34]

Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.

Google Scholar

[35]

Xu, L. N.; Xiangli, Y. B.; Rao, A. Y.; Zhao, N. X.; Dai, B.; Liu, Z. W.; Lin, D. H. BlockPlanner: City block generation with vectorized graph representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5057–5066, 2021.

Crossref

[36]

Wu, Z. J.; Wang, X.; Lin, D.; Lischinski, D.; Cohen-Or, D.; Huang, H. SAGNet: Structure-aware generative network for 3D-shape modeling. ACM Transactions on Graphics Vol. 38, No. 4, Article No. 91, 2019.

Crossref Google Scholar

[37]

Guerrero, P.; Hašan, M.; Sunkavalli, K.; Měch, R.; Boubekeur, T.; Mitra, N. J. MatFormer: A generative model for procedural materials. arXiv preprint arXiv:2207.01044, 2022.

Crossref Google Scholar

[38]

Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–6010, 2017.

[39]

Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X. H.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

Google Scholar

[40]

Liu, Z.; Lin, Y. T.; Cao, Y.; Hu, H.; Wei, Y. X.; Zhang, Z.; Lin, S.; Guo, B. N. Swin Transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030, 2021.

Crossref Google Scholar

[41]

Wang, X. P.; Yeshwanth, C.; Nießner, M. SceneFormer: Indoor scene generation with transformers. arXiv preprint arXiv:2012.09793, 2020.

Crossref Google Scholar

[42]

Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6629–6640, 2017.

[43]

Hao, Z. K.; Mallya, A.; Belongie, S.; Liu, M. Y. GANcraft: unsupervised 3D neural rendering of minecraft worlds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 14052–14062, 2021.

Crossref

[44]

Li, W. Y.; Chen, X. L.; Wang, J.; Chen, B. Q. Patch-based 3D natural scene generation from a single example. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16762–16772, 2023.

Crossref

[45]

Liu, A.; Makadia, A.; Tucker, R.; Snavely, N.; Jampani, V.; Kanazawa, A. Infinite nature: Perpetual view generation of natural scenes from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 14438–14447, 2021.

Crossref

Computational Visual Media

Volume 10 Issue 3,
June 2024

Pages 577-592

DOI: 10.1007/s41095-023-0365-1

Cite this article:

Cheng W, Shan Y. Learning layout generation for virtual worlds. Computational Visual Media, 2024, 10(3): 577-592. https://doi.org/10.1007/s41095-023-0365-1

186

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 25 April 2023

Accepted: 02 July 2023

Published: 02 May 2024

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.