AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (6.5 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

Learning layout generation for virtual worlds

ARC Lab, Tencent, Shenzhen 518057, China
Show Author Information

Graphical Abstract

Abstract

The emergence of the metaverse has led to the rapidly increasing demand for the generation of extensive 3D worlds. We consider that an engaging world is built upon a rational layout of multiple land-use areas (e.g., forest, meadow, and farmland). To this end, we propose a generative model of land-use distribution that learns from geographic data. The model is based on a transformer architecture that generates a 2D map of the land-use layout, which can be conditioned on spatial and semantic controls, depending on whether either one or both are provided. This model enables diverse layout generation with user control and layout expansion by extending borders with partial inputs. To generate high-quality and satisfactory layouts, we devise a geometric objective function that supervises the model to perceive layout shapes and regularize generations using geometric priors. Additionally, we devise a planning objective function that supervises the model to perceive progressive composition demands and suppress generations deviating from controls. To evaluate the spatial distribution of the generations, we train an autoencoder to embed land-use layouts into vectors to enable comparison between the real and generated data using the Wasserstein metric, which is inspired by the Fréchet inception distance.

References

[1]
Dionisio, J. D. N.; Burns III, W. G.; Gilbert, R. 3D virtual worlds and the metaverse: Current status and future possibilities. ACM Computing Surveys Vol. 45, No. 3, Article No. 34, 2013.
[2]
Li, J. N.; Yang, J. M.; Hertzmann, A.; Zhang, J. M.; Xu, T. F. LayoutGAN: Generating graphic layouts with wireframe discriminators. arXiv preprint arXiv:1901.06767, 2019.
[3]
Jyothi, A. A.; Durand, T.; He, J. W.; Sigal, L.; Mori, G. LayoutVAE: Stochastic scene layout generation from a label set. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 9894–9903, 2019.
[4]
Gupta, K.; Lazarow, J.; Achille, A.; Davis, L.; Mahadevan, V.; Shrivastava, A. LayoutTransformer: Layout generation and completion with self-attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 984–994, 2021.
[5]
Arroyo, D. M.; Postels, J.; Tombari, F. Variational transformer networks for layout generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13637–13647, 2021.
[6]
Yang, C. F.; Fan, W. C.; Yang, F. E.; Wang, Y. C. F. LayoutTransformer: Scene layout generation with conceptual and spatial diversity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3731–3740, 2021.
[7]
Chang, A. X.; Eric, M.; Savva, M.; Manning, C. D. SceneSeer: 3D scene design with natural language. arXiv preprint arXiv:1703.00050, 2017.
[8]
Li, M. Y.; Patil, A. G.; Xu, K.; Chaudhuri, S.; Khan, O.; Shamir, A.; Tu, C. H.; Chen, B. Q.; Cohen-Or, D.; Zhang, H. GRAINS: Generative recursive autoencoders for INdoor scenes. ACM Transactions on Graphics Vol. 38, No. 2, Article No. 12, 2019.
[9]
Wang, K.; Lin, Y.; Weissmann, B.; Savva, M.; Chang, A. X.; Ritchie, D. PlanIT: Planning and instantiating indoor scenes with relation graph and spatial prior networks. ACM Transactions on Graphics Vol. 38, No. 4, Article No. 132, 2019.
[10]
Ritchie, D.; Wang, K.; Lin, Y. A. Fast and flexible indoor scene synthesis via deep convolutional generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6175–6183, 2019.
[11]
Keshavarzi, M.; Parikh, A.; Zhai, X. Y.; Mao, M.; Caldas, L.; Yang, A. Y. SceneGen: Generative Contextual Scene Augmentation using Scene Graph Priors. arXiv preprint arXiv:2009.12395, 2020.
[12]
Chen, Q.; Wu, Q.; Tang, R.; Wang, Y. H.; Wang, S.; Tan, M. K. Intelligent home 3D: Automatic 3D-house design from linguistic descriptions only. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12622–12631, 2020.
[13]
Luo, A.; Zhang, Z. T.; Wu, J. J.; Tenenbaum, J. B. End-to-end optimization of scene layout. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3753–3762, 2020.
[14]
Dhamo, H.; Manhardt, F.; Navab, N.; Tombari, F. Graph-to-3D: End-to-end generation and manipulation of 3D scenes using scene graphs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 16332–16341, 2021.
[15]
Deussen, O.; Hanrahan, P.; Lintermann, B.; Měch, R.; Pharr, M.; Prusinkiewicz, P. Realistic modeling and rendering of plant ecosystems. In: Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, 275–286, 1998.
[16]
Parish, Y. I. H.; Müller, P. Procedural modeling of cities. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, 301–308, 2001.
[17]
Perlin, K. Improving noise. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, 681–682, 2002.
[18]
Zhou, H.; Sun, J.; Turk, G.; Rehg, J. M. Terrain synthesis from digital elevation models. IEEE Transactions on Visualization and Computer Graphics Vol. 13, No. 4, 834–848, 2007.
[19]
Merrell, P.; Schkufza, E.; Koltun, V. Computer-generated residential building layouts. In: Proceedings of the ACM SIGGRAPH Asia Papers, 1–12, 2010.
[20]
Fischer, R.; Dittmann, P.; Weller, R.; Zachmann, G. AutoBiomes: Procedural generation of multi-biome landscapes. The Visual Computer Vol. 36, No. 10, 2263–2272, 2020.
[21]
Esser, P.; Rombach, R.; Ommer, B. Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12868–12878, 2021.
[22]
Ramesh, A.; Pavlov, M.; Goh, G.; Gray, S.; Voss, C.; Radford, A.; Chen, M.; Sutskever, I. Zero-shot text-to-image generation. arXiv preprint arXiv:2102.12092, 2021.
[23]
Yu, J. H.; Xu, Y. Z.; Koh, J. Y.; Luong, T.; Baid, G.; Wang, Z. R.; Vasudevan, V.; Ku, A.; Yang, Y. F.; Ayan, B. K.; et al. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2022.
[24]
Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, 2022.
[25]
Freiknecht, J.; Effelsberg, W. A survey on the procedural generation of virtual worlds. Multimodal Technologies and Interaction Vol. 1, No. 4, 27, 2017.
[26]
Ebert, D. S.; Kenton Musgrave, F.; Peachey, D.; Perlin, K.; Worley, S. Texturing and Modeling: A Procedural Approach. Amsterdam: Elsevier, 2003.
[27]
Yu, L. F.; Yeung, S. K.; Tang, C. K.; Terzopoulos, D.; Chan, T. F.; Osher, S. J. Make it home: Automatic optimization of furniture arrangement. ACM Transactions on Graphics Vol. 30, No. 4, Article No. 86, 2011.
[28]
Zhou, Y.; While, Z.; Kalogerakis, E. SceneGraphNet: Neural message passing for 3D indoor scene augmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 7383–7391, 2019.
[29]
Wang, K.; Savva, M.; Chang, A. X.; Ritchie, D. Deep convolutional priors for indoor scene synthesis. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 70, 2018.
[30]
Chang, A.; Savva, M.; Manning, C. D. Learning spatial knowledge for text to 3D scene generation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2028–2038, 2014.
[31]
Ma, R.; Patil, A. G.; Fisher, M.; Li, M. Y.; Pirk, S.; Hua, B. S.; Yeung, S. K.; Tong, X.; Guibas, L.; Zhang, H. Language-driven synthesis of 3D scenes from scene databases. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 212, 2018.
[32]
Kingma, D. P.; Welling, M. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.
[33]
Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2, 2672–2680, 2014.
[34]
Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
[35]
Xu, L. N.; Xiangli, Y. B.; Rao, A. Y.; Zhao, N. X.; Dai, B.; Liu, Z. W.; Lin, D. H. BlockPlanner: City block generation with vectorized graph representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5057–5066, 2021.
[36]
Wu, Z. J.; Wang, X.; Lin, D.; Lischinski, D.; Cohen-Or, D.; Huang, H. SAGNet: Structure-aware generative network for 3D-shape modeling. ACM Transactions on Graphics Vol. 38, No. 4, Article No. 91, 2019.
[37]
Guerrero, P.; Hašan, M.; Sunkavalli, K.; Měch, R.; Boubekeur, T.; Mitra, N. J. MatFormer: A generative model for procedural materials. arXiv preprint arXiv:2207.01044, 2022.
[38]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–6010, 2017.
[39]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X. H.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[40]
Liu, Z.; Lin, Y. T.; Cao, Y.; Hu, H.; Wei, Y. X.; Zhang, Z.; Lin, S.; Guo, B. N. Swin Transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030, 2021.
[41]
Wang, X. P.; Yeshwanth, C.; Nießner, M. SceneFormer: Indoor scene generation with transformers. arXiv preprint arXiv:2012.09793, 2020.
[42]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6629–6640, 2017.
[43]
Hao, Z. K.; Mallya, A.; Belongie, S.; Liu, M. Y. GANcraft: unsupervised 3D neural rendering of minecraft worlds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 14052–14062, 2021.
[44]
Li, W. Y.; Chen, X. L.; Wang, J.; Chen, B. Q. Patch-based 3D natural scene generation from a single example. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16762–16772, 2023.
[45]
Liu, A.; Makadia, A.; Tucker, R.; Snavely, N.; Jampani, V.; Kanazawa, A. Infinite nature: Perpetual view generation of natural scenes from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 14438–14447, 2021.
Computational Visual Media
Pages 577-592
Cite this article:
Cheng W, Shan Y. Learning layout generation for virtual worlds. Computational Visual Media, 2024, 10(3): 577-592. https://doi.org/10.1007/s41095-023-0365-1

186

Views

4

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 25 April 2023
Accepted: 02 July 2023
Published: 02 May 2024
© The Author(s) 2024.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return