AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (12.9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

Learning to assess visual aesthetics of food images

Youtu Lab, Tencent, Shanghai 200233, China
NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Kuaishou Technology, Beijing 100085, China
Snap Inc., Santa Monica, 90405, USA
AI Lab, Tencent Inc., Shenzhen 518000, China
Show Author Information

Abstract

Distinguishing aesthetically pleasing food photos from others is an important visual analysis task for social media and ranking systems related to food. Nevertheless, aesthetic assessment of food images remains a challenging and relatively unexplored task, largely due to the lack of related food image datasets and practical knowledge. Thus, we present the Gourmet Photography Dataset (GPD), the first large-scale dataset for aesthetic assessment of food photos. It contains 24,000 images with corresponding binary aesthetic labels, covering a large variety of foods and scenes. We also provide a non-stationary regularization method to combat over-fitting and enhance the ability of tuned models to generalize. Quantitative results from extensive experiments, including a generalization ability test, verify that neural networks trained on the GPD achieve comparable performance to human experts on the task of aesthetic assessment. We reveal several valuable findings to support further research and applications related to visual aesthetic analysis of food images. To encourage further research, we have made the GPD publicly available at https://github.com/Openning07/GPA.

References

[1]
Manna, L. Digital food photography. Cengage Learning PTR, 2015.
[2]
Murray, N.; Marchesotti, L.; Perronnin, F. Ava: A large-scale database for aesthetic visual analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2408-2415, 2012.
[3]
Ma, S.; Liu, J.; Chen, C. W. A-lamp: Adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 722-731, 2017.
[4]
Hosu, V.; Goldlücke, B.; Saupe, D. Efiective aesthetics prediction with multi-level spatially pooled features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9367-9375, 2019.
[5]
Bossard, L.; Guillaumin, M.; van Gool, L. Food-101—mining discriminative components with random forests. In: Computer Vision-ECCV 2014. Lecture Notes in Computer Science, Vol. 8694. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 446-461, 2014.
[6]
Zhang, X. J.; Lu, Y. F.; Zhang, S. H. Multi-task learning for food identification and analysis with deep convolutional neural networks. Journal of Computer Science and Technology Vol. 31, No. 3, 489-500, 2016.
[7]
Salvador, A.; Hynes, N.; Aytar, Y.; Marin, J.; Oi, F.; Weber, I.; Torralba, A. Learning cross-modal embeddings for cooking recipes and food images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3068-3076, 2017.
[8]
Li, Y.; Sheopuri, A. Applying image analysis to assess food aesthetics and uniqueness. In: Proceedings of the IEEE International Conference on Image Processing, 311-314, 2015.
[9]
Luo, W.; Wang, X.; Tang, X. Content-based photo quality assessment. In: Proceedings of the IEEE International Conference on Computer Vision, 2206-2213, 2011.
[10]
Chen, X.; Zhu, Y.; Zhou, H.; Diao, L.; Wang, D. ChineseFoodNet: A large-scale image dataset for chinese food recognition. arXiv preprint arXiv:1705.02743, 2017.
[11]
Sheng, K. K.; Dong, W. M.; Huang, H. B.; Ma, C. Y.; Hu, B. G. Gourmet photography dataset for aesthetic assessment of food images. In: Proceedings of the SIGGRAPH Asia 2018 Technical Briefs, Article No. 20, 2018.
[12]
Datta, R.; Joshi, D.; Li, J.; Wang, J. Z. Studying aesthetics in photographic images using a computational approach. In: Computer Vision-ECCV 2006. Lecture Notes in Computer Science, Vol. 3953. Leonardis, A.; Bischof, H.; Pinz, A. Eds. Springer Berlin Heidelberg, 288-301, 2006.
[13]
Zhang, F. L., Wang, M.; Hu, S. M. Aesthetic image enhancement by dependence-aware object recomposition. IEEE Transactions on Multimedia Vol. 15, No. 7, 1480-1490, 2013.
[14]
Kong, S.; Shen, X. H.; Lin, Z.; Mech, R.; Fowlkes, C. Photo aesthetics ranking network with attributes and content adaptation. In: Computer Vision-ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 662-679, 2016.
[15]
Lu, X.; Lin, Z.; Shen, X.; Mech, R.; Wang, J. Z. Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In: Proceedings of the IEEE International Conference on Computer Vision, 990-998, 2015.
[16]
Talebi, H., Milanfar, P. NIMA: Neural image assessment. IEEE Transactions on Image Processing Vol. 27, No. 8, 3998-4011, 2018.
[17]
Sheng, K. K.; Dong, W. M.; Ma, C. Y.; Mei, X.; Huang, F. Y.; Hu, B. G. Attention-based multi-patch aggregation for image aesthetic assessment. In: Proceedings of the 26th ACM International Conference on Multimedia, 879-886, 2018.
[18]
Kucer, M.; Loui, A. C.; Messinger, D. W. Leveraging expert feature knowledge for predicting image aesthetics. IEEE Transactions on Image Processing Vol. 27, No. 10, 5100-5112, 2018.
[19]
Liu, Z. G.; Wang, Z. P.; Yao, Y. Y.; Zhang, L. M.; Shao, L. Deep active learning with contaminated tags for image aesthetics assessment. IEEE Transactions on Image Processing , 2018.
[20]
Sun, R.; Lian, Z.; Tang, Y.; Xiao, J. Aesthetic visual quality evaluation of Chinese handwritings. In: Proceedings of the International Joint Conferences on Artificial Intelligence, 2510-2516, 2015.
[21]
Chang, H. W.; Yu, F.; Wang, J.; Ashley, D.; Finkelstein, A. Automatic triage for a photo series. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 148, 2016.
[22]
Chang, K.-Y.; Lu, K.-H.; Chen, C.-S. Aesthetic critiques generation for photos. In: Proceedings of the IEEE International Conference on Computer Vision, 3514-3523, 2017.
[23]
Hung, W.-C.; Zhang, J.; Shen, X.; Lin, Z.; Lee, J.-Y.; Yang, M.-H. Learning to blend photos. In: Proceedings of the European Conference on Computer Vision, 70-86, 2018.
[24]
Yu, W. H.; Zhang, H. D.; He, X. N.; Chen, X.; Xiong, L.; Qin, Z. Aesthetic-based clothing recommendation. In: Proceedings of the World Wide Web Conference, 649-658, 2018.
[25]
Hassannejad, H.; Matrella, G.; Ciampolini, P.; de Munari, I.; Mordonini, M.; Cagnoni, S. Food image recognition using very deep convolutional networks. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, 41-49, 2016.
[26]
Meyers, A.; Johnston, N.; Rathod, V.; Korattikara, A.; Gorban, A.; Silberman, N.; Guadarrama, S.; Papandreou, G.; Huang, J.; Murphy, K. P. Im2Calories: Towards an automated mobile vision food diary. In: Proceedings of the IEEE International Conference on Computer Vision, 1233-1241, 2015.
[27]
Hinton, G. E.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2014.
[28]
Szegedy, C.; Vanhoucke, V.; Iofie, S.; Shlens, J.; Wojna. Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818-2826, 2016.
[29]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, No. 1, 1929-1958, 2014.
[30]
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural networks. Communications of the ACM Vol. 60, No. 6, 84-90, 2017.
[31]
Hein, M.; Andriushchenko, M.; Bitterwolf, J. Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 41-50, 2019.
[32]
Manning, C. D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval. Cambridge University Press, 2008.
[33]
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248-255, 2009.
[34]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-9, 2015.
[35]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.
[36]
Oliva, A.; Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision Vol. 42, No. 3, 145-175, 2001.
[37]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556v6, 2015.
[38]
Zhou, B. L.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 6, 1452-1464, 2018.
[39]
Zhang, R.; Efros, A. A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 586-595, 2018.
[40]
Mai, L.; Jin, H.; Liu, F. Composition-preserving deep photo aesthetics assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 497-506, 2016.
[41]
Zhang, X. D.; Gao, X. B.; Lu, W.; He, L. H. A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction. IEEE Transactions on Multimedia Vol. 21, No. 11, 2815-2826, 2019.
[42]
Deng, Y.; Loy, C. C.; Tang, X. Aesthetic-driven image enhancement by adversarial learning. In: Proceedings of the 26th ACM International Conference on Multimedia, 870-878, 2018.
[43]
Hu, Y.; He, H.; Xu, C.; Wang, B.; Lin, S. Exposure: A white-box photo post-processing framework. ACM Transactions on Graphics Vol. 37, No. 2, Article No. 26, 2018.
[44]
Xu, Z.; Huang, S. L.; Zhang, Y.; Tao, D. C. Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 5, 1100-1113, 2018.
[45]
Sheng, K. K.; Dong, W. M.; Kong, Y.; Mei, X.; Li, J. L.; Wang, C. J.; Huang, F.; Hu, B. Evaluating the quality of face alignment without ground truth. Computer Graphics Forum Vol. 34, No. 7, 213-223, 2015.
[46]
Papadopoulos, D. P.; Tamaazousti, Y.; Oi, F.; Weber, I.; Torralba, A. How to make a pizza: Learning a compositional layer-based GAN model. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8002-8011, 2019.
Computational Visual Media
Pages 139-152
Cite this article:
Sheng K, Dong W, Huang H, et al. Learning to assess visual aesthetics of food images. Computational Visual Media, 2021, 7(1): 139-152. https://doi.org/10.1007/s41095-020-0193-5

829

Views

40

Downloads

14

Crossref

N/A

Web of Science

15

Scopus

3

CSCD

Altmetrics

Received: 09 June 2020
Accepted: 25 August 2020
Published: 28 November 2020
© The Author(s) 2020

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.

Return