DeepPrimitive: Image decomposition by layered primitive detection

Jiahui Huang; Jun Gao; Vignesh Ganapathi-Subramanian; Hao Su; Yin Liu; Chengcheng Tang; Leonidas J. Guibas

doi:10.1007/s41095-018-0128-6

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (12 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research Article | Open Access

DeepPrimitive: Image decomposition by layered primitive detection

Jiahui Huang^¹(

), Jun Gao^², Vignesh Ganapathi-Subramanian^³, Hao Su^⁴, Yin Liu^⁵, Chengcheng Tang^³, Leonidas J. Guibas^³

1 Tsinghua University, Beijing, 100084, China.

2 Computer Science Department, University of Toronto, Toronto, M5S2E4, Canada.

3 Stanford University, Stanford, 94305, United States.

4 University of California San Diego, La Jolla, 92093, United States.

5 University of Wisconsin-Madison, Madison, 53715, United States.

Show Author Information

Abstract

The perception of the visual world through basic building blocks, such as cubes, spheres, and cones, gives human beings a parsimonious understanding of the visual world. Thus, efforts to find primitive-based geometric interpretations of visual data date back to 1970s studies of visual media. However, due to the difficulty of primitive fitting in the pre-deep learning age, this research approach faded from the main stage, and the vision community turned primarily to semantic image understanding. In this paper, we revisit the classical problem of building geometric interpretations of images, using supervised deep learning tools. We build a framework to detect primitives from images in a layered manner by modifying the YOLO network; an RNN with a novel loss function is then used to equip this network with the capability to predict primitives with a variable number of parameters. We compare our pipeline to traditional and other baseline learning methods, demonstrating that our layered detection model has higher accuracy and performs better reconstruction.

Keywords

deep learning layered image decomposition primitive detection biologically inspired vision

Electronic Supplementary Material

Download File(s)

41095_2018_128_MOESM1_ESM.pdf (8.1 MB)

References

[1]

J. Redmon,; S. Divvala,; R. Girshick,; A. Farhadi, You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779-788, 2016.

Crossref

[2]

J. Redmon,; A. Farhadi, YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6517-6525, 2017.

Crossref

[3]

L. G. Roberts, Machine perception of three-dimensional solids. Ph.D. Thesis. Massachusetts Institute of Technology, 1963.

[4]

T. O. Binford, Visual perception by computer. In: Proceedings of the IEEE Conference on Systems and Control, 1971.

[5]

I. Biederman, Recognition-by-components: A theory of human image understanding. Psychological Review Vol. 94, No. 2, 115-147, 1987.

Crossref Google Scholar

[6]

M. Bellver,; X. Giro-i-Nieto,; F. Marques,; J. Torres, Hierarchical object detection with deep reinforcement learning. In: Proceedings of the Deep Reinforcement Learning Workshop, NIPS, 2016.

[7]

D. H. Ballard, Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition Vol. 13, No. 2, 111-122, 1981.

Crossref Google Scholar

[8]

S. Ren,; K. He,; R. Girshick,; J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137-1149, 2017.

Crossref Google Scholar

[9]

W. Liu,; D. Anguelov,; D. Erhan,; C. Szegedy,; S. Reed,; C.-Y. Fu,; A. C. Berg, SSD: Single shot multibox detector. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 21-37, 2016.

Crossref

[10]

I. Higgins,; N. Sonnerat,; L. Matthey,; A. Pal,; C. Burgess,; M. Botvinick,; D. Hassabis,; A. Lerchner, SCAN: Learning abstract hierarchical compositional visual concepts. arXiv preprint arXiv:1707.03389, 2017.

[11]

B. M. Lake,; R. Salakhutdinov,; J. B. Tenenbaum, Human-level concept learning through probabilistic program induction. Science Vol. 350, No. 6266, 1332-1338, 2015.

Crossref Google Scholar

[12]

D. F. Rogers,; N. Fog, Constrained B-spline curve and surface fitting. Computer-Aided Design Vol. 21, No. 10, 641-648, 1989.

Crossref Google Scholar

[13]

P. J. Besl,; N. D. McKay, A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 14, No. 2, 239-256, 1992.

Crossref Google Scholar

[14]

Y. Chen,; G. Medioni, Object modeling by registration of multiple range images. In: Proceedings of the IEEE International Conference on Robotics and Automation, 2724-2729, 1991.

[15]

W. Wang,; H. Pottmann,; Y. Liu, Fitting B-spline curves to point clouds by curvature-based squared distance minimization. ACM Transactions on Graphics Vol. 25, No. 2, 214-238, 2006.

Crossref Google Scholar

[16]

W. Zheng,; P. Bo,; Y. Liu,; W. Wang, Fast B-spline curve fitting by L-BFGS. Computer Aided Geometric Design Vol. 29, No. 7, 448-462, 2012.

Crossref Google Scholar

[17]

J. Sun,; L. Liang,; F. Wen,; H.-Y. Shum, Image vectorization using optimized gradient meshes. ACM Transactions on Graphics Vol. 26, No. 3, Article No. 11, 2007.

Crossref Google Scholar

[18]

G. Lecot,; B. Levy, Ardeco: Automatic region detection and conversion. In: Proceedings of the 17th Eurographics Symposium on Rendering Techniques, 349-360, 2006.

[19]

T.-Y. Lin,; P. Dollár,; R. Girshick,; K. He,; B. Hariharan,; S. Belongie, Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117-2125, 2017.

Crossref

[20]

G. Sharma,; R. Goyal,; D. Liu,; E. Kalogerakis,; S. Maji, CSGNet: Neural shape parser for constructive solid geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5515-5523, 2018.

Crossref

[21]

F. A. Gers,; N. N. Schraudolph,; J. Schmidhuber, Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research Vol. 3, No. 1, 115-143, 2002.

Google Scholar

[22]

K. Cho,; B. V. Merriënboer,; C. Gulcehre,; D. Bahdanau,; F. Bougares,; H. Schwenk,; Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.

Crossref

[23]

L. Castrejón,; K. Kundu,; R. Urtasun,; S. Fidler, Annotating object instances with a polygon-RNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5230-5238, 2017.

Crossref

[24]

S. Jetley,; M. Sapienza,; S. Golodetz,; P. H. S. Torr, Straight to shapes: Real-time detection of encoded shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4207-4216, 2017.

Crossref

[25]

R. Girshick, Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 1440-1448, 2015.

Crossref

[26]

S. Bengio,; O. Vinyals,; N. Jaitly,; N. Shazeer, Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems 28. C. Cortes,; N. D. Lawrence,; D. D. Lee,; M. Sugiyama,; R. Garnett, Eds. Curran Associates, Inc., 1171-1179, 2015.

[27]

J. Wu,; J. B. Tenenbaum,; P. Kohli, Neural scene de-rendering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

Crossref

[28]

Itseez. Open source computer vision library. 2015. Available at https://github.com/itseez/opencv.

[29]

R. O. Duda,; P. E. Hart, Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM Vol. 15, No. 1, 11-15, 1972.

Crossref Google Scholar

[30]

Y. Xie,; Q. Ji, A new efficient ellipse detection method. In: Proceedings of the IEEE International Conference on Pattern Recognition, Vol. 2, 957-960, 2002.

[31]

Google. Google material icon. 2017. Available at https://material.io/icons/.

[32]

M. Everingham, The PASCAL Visual Object Classes Challenge 2012 (VOC2012). Available at http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.

[33]

C. Barnes,; E. Shechtman,; A. Finkelstein,; D. B. Goldman, PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics Vol. 28, No. 3, Article No. 24, 2009.

Crossref Google Scholar

Computational Visual Media

Volume 4 Issue 4,
December 2018

Pages 385-397

DOI: 10.1007/s41095-018-0128-6

Cite this article:

Huang J, Gao J, Ganapathi-Subramanian V, et al. DeepPrimitive: Image decomposition by layered primitive detection. Computational Visual Media, 2018, 4(4): 385-397. https://doi.org/10.1007/s41095-018-0128-6

791

Views

Downloads

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Revised: 30 November 2018

Accepted: 03 December 2018

Published: 23 December 2018

This article is published with open access at Springerlink.com

The articles published in this journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.