What and where: A context-based recommendation system for object insertion

Song-Hai Zhang; Zheng-Ping Zhou; Bin Liu; Xi Dong; Peter Hall

doi:10.1007/s41095-020-0158-8

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (1.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research Article | Open Access

What and where: A context-based recommendation system for object insertion

Song-Hai Zhang^{¹^,²}(

), Zheng-Ping Zhou^¹, Bin Liu^¹, Xi Dong^¹, Peter Hall^³

1 Tsinghua University, Beijing 100084, China.

2 Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing 100084, China.

3 Department of Computer Science Media Technology Research Center, University of Bath, Bath BA2 7AY, UK.

Show Author Information

Abstract

We propose a novel problem revolving around two tasks: (i) given a scene, recommend objects to insert, and (ii) given an object category, retrieve suitable background scenes. A bounding box for the inserted object is predicted in both tasks, which helps downstream applications such as semi-automated advertising and video composition. The major challenge lies in the fact that the target object is neither present nor localized in the input, and furthermore, available datasets only provide scenes with existing objects. To tackle this problem, we build an unsupervised algorithm based on object-level contexts, which explicitly models the joint probability distribution of object categories and bounding boxes using a Gaussian mixture model. Experiments on our own annotated test set demonstrate that our system outperforms existing baselines on all sub-tasks, and does so using a unified framework. Future extensions and applications are suggested.

Keywords

object recommendation bounding box pre-diction image composition object-level context

References

[1]

F. Ricci,; L. Rokach,; B. Shapira, Recommender Systems Handbook. Boston: Springer, 2011.

Crossref

[2]

Recommender system. Available at https://en.wikipedia.org/wiki/Recommender_system.

[3]

J. Johnson,; R. Krishna,; M. Stark,; L. J. Li,; D. A. Shamma,; M. S. Bernstein,; L. Fei-Fei, Image retrieval using scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3668-3678, 2015.

Crossref

[4]

J. Wang,; W. Liu,; S. Kumar,; S. F. Chang, Learning to hash for indexing big data: A survey. Proceedings of the IEEE Vol. 104, No. 1, 34-57, 2016.

Crossref Google Scholar

[5]

L. Zheng,; Y. Yang,; Q. Tian, SIFT meets CNN: A decade survey of instance retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 5, 1224-1244, 2018.

Crossref Google Scholar

[6]

A. Rabinovich,; A. Vedaldi,; C. Galleguillos,; E. Wiewiora,; S. Belongie, Objects in context. In: Proceedings of the IEEE 11th International Conference on Computer Vision, 1-8, 2007.

Crossref

[7]

K. M. He,; X. Y. Zhang,; S. Q. Ren,; J. Sun, Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.

Crossref

[8]

A. Krizhevsky,; I. Sutskever,; G. E. Hinton, ImageNet classification with deep convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems 25, 1097-1105, 2012.

[9]

C. Szegedy,; W. Liu,; Y. Q. Jia,; P. Sermanet,; S. Reed,; D. Anguelov,; D. Erhan,; V. Vanhoucke,; A. Rabinovich, Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-9, 2015.

Crossref

[10]

S. Ren,; K. He,; R. Girshick,; J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the Advances in Neural Information Processing Systems 28, 91-99, 2015.

[11]

R. Girshick,; J. Donahue,; T. Darrell,; J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 580-587, 2014.

Crossref

[12]

W. Liu,; D. Anguelov,; D. Erhan,; C. Szegedy,; S. Reed,; C. Y. Fu,; A. C. Berg, SSD: Single shot MultiBox detector. In: Computer Vision-ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 21-37, 2016.

Crossref

[13]

B. L. Zhou,; A. Khosla,; A. Lapedriza,; A. Oliva,; A. Torralba, Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2921-2929, 2016.

Crossref

[14]

H. Bilen,; A. Vedaldi, Weakly supervised deep detection networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2846-2854, 2016.

Crossref

[15]

V. Kantorov,; M. Oquab,; M. Cho,; I. Laptev, ContextLocNet: Context-aware deep network models for weakly supervised localization. In: Computer Vision-ECCV 2016. Lecture Notes in Computer Science, Vol. 9909. B. Leibe; J. Matas; N. Sebe; M. Welling Eds. Springer Cham, 350-365, 2016.

Crossref

[16]

K. M. He,; G. Gkioxari,; P. Dollar,; R. Girshick, Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2961-2969, 2017.

Crossref

[17]

J. Long,; E. Shelhamer,; T. Darrell, Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431-3440, 2015.

Crossref

[18]

W. Liu,; A. Rabinovich,; A. C. Berg, Parsenet: Looking wider to see better. arXiv preprint arXiv:1506.04579, 2015.

[19]

W. Zhou,; H. Li,; Q. Tian, Recent advance in content-based image retrieval: A literature survey. arXiv preprint arXiv:1706.06064, 2017.

[20]

S.-M. Hu,; F.-L. Zhang,; M Wang,; R. R. Martin,; J. Wang, PatchNet: A patch-based image representation for interactive library-driven image editing. ACM Transactions on Graphics Vol. 32, No. 6, Article No. 196, 2013.

Crossref Google Scholar

[21]

J. H. Yu,; Z. Lin,; J. M. Yang,; X. H. Shen,; X. Lu,; T. S. Huang, Generative image inpainting with contextual attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5505-5514, 2018.

Crossref

[22]

S. Hong,; X. Yan,; T. Huang,; H. Lee, Learning hierarchical semantic image manipulation through structured representations. In: Proceedings of the 32nd Conference on Neural Information Processing Systems, 2708-2718, 2018.

[23]

D. Lee,; S. Liu,; J. Gu,; M.-Y. Liu,; M.-H. Yang,; J. Kautz, Context-aware synthesis and placement of object instances. In: Proceedings of the Advances in Neural Information Processing Systems 31, 10393-10403, 2018.

[24]

C.H. Lin,; E. Yumer,; O. Wang,; E. Shechtman,; S. Lucey, ST-GAN: Spatial transformer generative adversarial networks for image compositing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9455-9464, 2018.

Crossref

[25]

F. W. Tan,; C. Bernier,; B. Cohen,; V. Ordonez,; C. Barnes, Where and who? Automatic semantic-aware person composition. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 1519-1528, 2018.

Crossref

[26]

P. Anderson,; X. D. He,; C. Buehler,; D. Teney,; M. Johnson,; S. Gould,; L. Zhang, Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6077-6086, 2018.

Crossref

[27]

D. F. Xu,; Y. K. Zhu,; C. B. Choy,; L. Fei-Fei, Scene graph generation by iterative message passing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3097-3106, 2017.

Crossref

[28]

R. Krishna,; Y. K. Zhu,; O. Groth,; J. Johnson,; K. J. Hata,; J. Kravitz,; S. Chen,; Y. Kalantidis,; L.-J. Li,; D. A Shamma,. et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision Vol. 123, No. 1, 32-73, 2017.

Crossref Google Scholar

[29]

F. Pedregosa,; G. Varoquaux,; A. Gramfort,; V. Michel,; B. Thirion,; O. Grisel,; M. Blondel,; P. Prettenhofer,; R. Weiss,; V Dubourg,. et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research Vol. 12, 2825-2830, 2011.

Google Scholar

[30]

K. Järvelin,; J. Kekäläinen, Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems Vol. 20, No. 4, 422-446, 2002.

Crossref Google Scholar

[31]

Bag-of-words model. Available at https://en.wikipedia.org/wiki/Bag-of-words_model.

[32]

F. Yu,; V. Koltun, Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.

[33]

T. Y. Lin,; M. Maire,; S. Belongie,; J. Hays,; P. Perona,; D. Ramanan,; P. Dollar,; C. L. Zitnick, Microsoft COCO: Common objects in context. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. D. Fleet,; T. Pajdla,; B. Schiele,; T. Tuytelaars, Eds. Springer Cham, 740-755, 2014.

Crossref

[34]

S. H. Zhang,; S. K. Zhang,; Y. Liang,; P. Hall, A survey of 3D indoor scene synthesis. Journal of Computer Science and Technology Vol. 34, No. 3, 594-608, 2019.

Crossref Google Scholar

[35]

S. M. Ge,; X. Jin,; Q. T. Ye,; Z. Luo,; Q. Li, Image editing by object-aware optimal boundary searching and mixed-domain composition. Computational Visual Media Vol. 4, No. 1, 71-82, 2018.

Crossref Google Scholar

[36]

H. Todo,; Y. Yamaguchi, Estimating reflectance and shape of objects from a single cartoon-shaded image. Computational Visual Media Vol. 3, No. 1, 21-31, 2017.

Crossref Google Scholar

Computational Visual Media

Volume 6 Issue 1,
March 2020

Pages 79-93

DOI: 10.1007/s41095-020-0158-8

Cite this article:

Zhang S-H, Zhou Z-P, Liu B, et al. What and where: A context-based recommendation system for object insertion. Computational Visual Media, 2020, 6(1): 79-93. https://doi.org/10.1007/s41095-020-0158-8

863

Views

Downloads

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Revised: 24 December 2019

Accepted: 29 January 2020

Published: 02 April 2020

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.