PDF (21.8 MB)
Collect
Submit Manuscript
Research Article | Open Access

BING: Binarized normed gradients for objectness estimation at 300fps

Ming-Ming Cheng1()Yun Liu1Wen-Yan Lin2Ziming Zhang3Paul L. Rosin4Philip H. S. Torr5
CCS, Nankai University, Tianjin 300350, China.
Institute for Infocomm Research, Singapore, 138632.
MERL, Cambridge, MA 02139-1955, US.
Cardiff University, Wales, CF24 3AA, UK.
University of Oxford, Oxford, OX1 3PJ, UK.

* These authors contributed equally to this work.

Show Author Information

Abstract

Training a generic objectness measure to produce object proposals has recently become of significant interest. We observe that generic objects with well-defined closed boundaries can be detected by looking at the norm of gradients, with a suitable resizing of their corresponding image windows to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8×8 and use the norm of the gradients as a simple 64D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this feature, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g., add, bitwise shift, etc.). To improve localization quality of the proposals while maintaining efficiency, we propose a novel fast segmentation method and demonstrate its effectiveness for improving BING’s localization performance, when used in multi-thresholding straddling expansion (MTSE) post-processing. On the challenging PASCAL VOC2007 dataset, using 1000 proposals per image and intersection-over-union threshold of 0.5, our proposal method achieves a 95.6% object detection rate and 78.6% mean average best overlap in less than 0.005 second per image.

References

[1]
B. Alexe,; T. Deselaers,; V. Ferrari, Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 11, 2189-2202, 2012.
[2]
B. Alexe,; T. Deselaers,; V. Ferrari, What is an object? In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 73-80, 2010.
[3]
R. Girshick,; J. Donahue,; T. Darrell,; J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 580-587, 2014.
[4]
R. Girshick, Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 1440-1448, 2015.
[5]
K. He,; X. Zhang,; S. Ren,; J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Computer Vision-ECCV 2014. Lecture Notes in Computer Science, Vol. 8691. D. Fleet,; T. Pajdla,; B. Schiele,; T. Tuytelaars, Eds. Springer Cham, 346-361, 2014.
[6]
N. Wang,; S. Li,; A. Gupta,; D.-Y. Yeung, Transferring rich feature hierarchies for robust visual tracking. arXiv preprint arXiv:1501.04587, 2015.
[7]
S. Kwak,; M. Cho,; I. Laptev,; J. Ponce,; C. Schmid, Unsupervised object discovery and tracking in video collections. In: Proceedings of the IEEE International Conference on Computer Vision, 3173-3181, 2015.
[8]
C. Kading,; A. Freytag,; E. Rodner,; P. Bodesheim,; J. Denzler, Active learning and discovery of object categories in the presence of unnameable instances. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4343-4352, 2015.
[9]
M. Cho,; S. Kwak,; C. Schmid,; J. Ponce, Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1201-1210, 2015.
[10]
P. Arbeláez,; B. Hariharan,; C. Gu,; S. Gupta,; L. Bourdev,; J. Malik, Semantic segmentation using regions and parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3378-3385, 2012.
[11]
J. Carreira,; R. Caseiro,; J. Batista,; C. Sminchisescu, Semantic segmentation with second-order pooling. In: Computer Vision-ECCV 2012. Lecture Notes in Computer Science, Vol. 7578. A. Fitzgibbon,; S. Lazebnik,; P. Perona,; Y. Sato,; C. Schmid, Eds. Springer Berlin Heidelberg, 430-443, 2012.
[12]
J. Sun,; H. Ling, Scale and object aware image retargeting for thumbnail browsing. In: Proceedings of the International Conference on Computer Vision, 1511-1518, 2011.
[13]
F. Sener,; C. Bas,; N. Ikizler-Cinbis, On recognizing actions in still images via multiple features. In: Computer Vision-ECCV 2012. Workshops and Demonstrations. Lecture Notes in Computer Science, Vol. 7585. A. Fusiello,; V. Murino,; R. Cucchiara, Eds. Springer Berlin Heidelberg, 263-272, 2012.
[14]
H.-L. Teuber, Physiological psychology. Annual Review of Psychology Vol. 6, 267-296, 1955.
[15]
J. M. Wolfe,; T. S. Horowitz, What attributes guide the deployment of visual attention and how do they do it? Nature Reviews Neuroscience Vol. 5, 495-501, 2004.
[16]
C. Koch,; S. Ullman, Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurbiology Vol. 4, No. 4, 219-227, 1985.
[17]
R. Desimone,; J. Duncan, Neural mechanisms of selective visual attention. Annual Review of Neuroscience Vol. 18, 193-222, 1995.
[18]
D. A. Forsyth,; J. Malik,; M. M. Fleck,; H. Greenspan,; T. Leung,; S. Belongie,; C. Carson,; C. Bregler, Finding pictures of objects in large collections of images. In: Object Representation in Computer Vision II. Lecture Notes in Computer Science, Vol. 1144. J. Ponce,; A. Zisserman,; M. Hebert, Eds. Springer Berlin Heidelberg, 335-360, 1996.
[19]
G. Heitz,; D. Koller, Learning spatial context: Using stuff to find things. In: Computer Vision-ECCV 2008. Lecture Notes in Computer Science, Vol. 5302. D. Forsyth,; P. Torr,; A. Zisserman, Eds. Springer Berlin Heidelberg, 30-43, 2008.
[20]
J. R. R. Uijlings,; K. E. A. van de Sande,; T. Gevers,; A. W. M. Smeulders, Selective search for object recognition. International Journal on Computer Vision Vol. 104, No. 2, 154-171, 2013.
[21]
I. Endres,; D. Hoiem, Category-independent object proposals with diverse ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 36, No. 2, 222-234, 2014.
[22]
M.-M. Cheng,; Z. Zhang,; W.-Y. Lin,; P. H. S. Torr, BING: Binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3286-3293, 2014.
[23]
Y. Wei,; W. Xia,; M. Lin,; J. Huang,; B. Ni,; J. Dong,; Y. Zhao,; S. Yan, HCP: A flexible CNN framework for multi-label image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 9, 1901-1907, 2016.
[24]
S. Zha,; F. Luisier,; W. Andrews,; N. Srivastava,; R. Salakhutdinov, Exploiting image-trained CNN architectures for unconstrained video classification. In: Proceedings of the British Machine Vision Conference, 2015.
[25]
P. O. Pinheiro,; R. Collobert, From image-level to pixel-level labeling with convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1713-1721, 2015.
[26]
J. Wu,; Y. Yu,; C. Huang,; K. Yu, Deep multiple instance learning for image classification and auto-annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3460-3469, 2015.
[27]
Y. J. Lee,; K. Grauman, Predicting important objects for egocentric video summarization. International Journal on Computer Vision Vol. 114, No. 1, 38-55, 2015.
[28]
S. Paisitkriangkrai,; C. Shen,; A. v. d. Hengel, Pedestrian detection with spatially pooled features and structured ensemble learning. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 6, 1243-1257, 2016.
[29]
D. Zhang,; J. Han,; C. Li,; J. Wang,; X. Li, Detection of co-salient objects by looking deep and wide. International Journal on Computer Vision Vol. 120, No. 2, 215-232, 2016.
[30]
S. Ren,; K. He,; R. Girshick,; J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137-1149, 2015.
[31]
J. Redmon,; A. Farhadi, YOLO9000: Better, faster, stronger. arXiv preprint arXiv:1612.08242, 2016.
[32]
W. Liu,; D. Anguelov,; D. Erhan,; C. Szegedy,; S. Reed,; C.-Y. Fu,; A. C. Berg, SSD: Single shot multibox detector. In: Computer Vision-ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 21-37, 2016.
[33]
M. Everingham,; L. Van Gool,; C. K. I. Williams,; J. Winn,; A. Zisserman, The PASCAL visual object classes (VOC) challenge. International Journal on Computer Vision Vol. 88, No. 2, 303-338, 2010.
[34]
C. L. Zitnick,; P. Dollár, Edge boxes: Locating object proposals from edges. In: Computer Vision-ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. D. Fleet,; T. Pajdla,; B. Schiele,; T. Tuytelaars, Eds. Springer Cham, 391-405, 2014.
[35]
J. Hosang,; R. Benenson,; P. Dollár,; B. Schiele, What makes for effective detection proposals? IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 4, 814-830, 2016.
[36]
J. Pont-Tuset,; P. Arbeláez,; J. T. Barron,; F. Marques,; J. Malik, Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 1, 128-140, 2017.
[37]
Q. Zhao,; Z. Liu,; B. Yin, Cracking BING and beyond. In: Proceedings of the British Machine Vision Conference, 2014.
[38]
X. Chen,; H. Ma,; X. Wang,; Z. Zhao, Improving object proposals with multi-thresholding straddling expansion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2587-2595, 2015.
[39]
C. Y. Ren,; V. A. Prisacariu,; I. D. Reid, gSLICr: SLIC superpixels at over 250Hz. arXiv preprint arXiv:1509.04232, 2015.
[40]
R. Achanta,; A. Shaji,; K. Smith,; A. Lucchi,; P. Fua,; S. SÃijsstrunk, SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 11, 2274-2282, 2012.
[41]
P. F. Felzenszwalb,; D. P. Huttenlocher, Efficient graph-based image segmentation. International Journal on Computer Vision Vol. 59, No. 2, 167-181, 2004.
[42]
M.-M. Cheng,; Y. Liu,; Q. Hou,; J. Bian,; P. Torr,; S.-M. Hu,; Z. Tu, HFS: Hierarchical feature selection for efficient image segmentation. In: Computer Vision-ECCV 2016. Lecture Notes in Computer Science, Vol. 9907. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 867-882, 2016.
[43]
T.-Y. Lin,; M. Maire,; S. Belongie,; J. Hays,; P. Perona,; D. Ramanan,; P. Dollár,; C. L. Zitnick, Microsoft COCO: Common objects in context. In: Computer Vision-ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. D. Fleet,; T. Pajdla,; B. Schiele,; T. Tuytelaars, Eds. Springer Cham, 740-755, 2014.
[44]
Z. Zhang,; J. Warrell,; P. H. S. Torr, Proposal generation for object detection using cascaded ranking SVMs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1497-1504, 2011.
[45]
E. Rahtu,; J. Kannala,; M. B. Blaschko, Learning a category independent object detection cascade. In: Proceedings of the International Conference on Computer Vision, 1052-1059, 2011.
[46]
S. Manen,; M. Guillaumin,; L. Van Gool, Prime object proposals with randomized Prim’s algorithm. In: Proceedings of the IEEE International Conference on Computer Vision, 2536-2543, 2013.
[47]
P. Rantalankila,; J. Kannala,; E. Rahtu, Generating object segmentation proposals using global and local search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2417-2424, 2014.
[48]
P. Krähenbühl,; V. Koltun, Geodesic object proposals. In: Computer Vision-ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. D. Fleet,; T. Pajdla,; B. Schiele,; T. Tuytelaars, Eds. Springer Cham, 725-739, 2014.
[49]
P. Krähenbühl,; V. Koltun, Learning to propose objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1574-1582, 2015.
[50]
A. Humayun,; F. Li,; J. M. Rehg, RIGOR: Reusing inference in graph cuts for generating object regions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 336-343, 2014.
[51]
A. Borji,; M. M. Cheng,; H. Jiang, et al. Salient object detection: A survey. arXiv preprint arXiv:1411.5878, 2014.
[52]
T. Judd,; F. Durand,; A. Torralba, A benchmark of computational models of saliency to predict human fixations. Technical Report. MIT Tech Report, 2012.
[53]
L. Itti,; C. Koch,; E. Niebur, A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 20, No. 11, 1254-1259, 1998.
[54]
Y.-F. Ma,; H.-J. Zhang, Contrast-based image attention analysis by using fuzzy growing. In: Proceedings of the 11th ACM International Conference on Multimedia, 374-381, 2003.
[55]
J. Harel,; C. Koch,; P. Perona, Graph-based visual saliency. In: Proceedings of the 19th International Conference on Neural Information Processing Systems, 545-552, 2006.
[56]
A. Borji,; D. N. Sihite,; L. Itti, Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Transactions on Image Processing Vol. 22, No. 1, 55-69, 2013.
[57]
Y. Li,; X. Hou,; C. Koch,; J. M. Rehg,; A. L. Yuille, The secrets of salient object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 280-287, 2014.
[58]
A. Borji,; M.-M. Cheng,; H. Jiang,; J. Li, Salient object detection: A benchmark. IEEE Transactions on Image Processing Vol. 24, No. 12, 5706-5722, 2015.
[59]
T. Liu,; J. Sun,; N. Zheng,; X. Tang,; H. Shum, Learning to detect a salient object. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-8, 2007.
[60]
R. Achanta,; S. Hemami,; F. Estrada,; S. Susstrunk, Frequency-tuned salient region detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1597-1604, 2009.
[61]
M.-M. Cheng,; N. J. Mitra,; X. Huang,; P. H. S. Torr,; S.-M. Hu, Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 569-582, 2015.
[62]
F. Perazzi,; P. Krähenbühl,; Y. Pritch,; A. Hornung, Saliency filters: Contrast based filtering for salient region detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 733-740, 2012.
[63]
M.-M. Cheng,; S. Zheng,; W.-Y. Lin,; V. Vineet,; P. Sturgess,; N. Crook,; N. J. Mitra,; P. Torr, ImageSpirit: Verbal guided image parsing. ACM Transactions on Graphics Vol. 34, No. 1, Article No. 3, 2014.
[64]
S. Zheng,; M.-M. Cheng,; J. Warrell,; P. Sturgess,; V. Vineet,; C. Rother,; P. H. S. Torr, Dense semantic image segmentation with objects and attributes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3214-3221, 2014.
[65]
K. Li,; Y. Zhu,; J. Yang,; J. Jiang, Video super-resolution using an adaptive superpixel-guided auto-regressive model. Pattern Recognition Vol. 51, 59-71, 2016.
[66]
G.-X. Zhang,; M.-M. Cheng,; S.-M. Hu,; R. R. Martin, A shape-preserving approach to image resizing. Computer Graphics Forum Vol. 28, No. 7, 1897-1906, 2009.
[67]
Y. Zheng,; X. Chen,; M.-M. Cheng,; K. Zhou,; S.-M. Hu,; N. J. Mitra, Interactive images: Cuboid proxies for smart image manipulation. ACM Transactions on Graphics Vol. 31, No. 4, Article No. 99, 2012.
[68]
T. Chen,; M.-M. Cheng,; P. Tan,; A. Shamir,; S.-M. Hu, Sketch2Photo: Internet image montage. ACM Transactions on Graphics Vol. 28, No. 5, Article No. 124, 2009.
[69]
H. Huang,; L. Zhang,; H.-C. Zhang, Arcimboldo-like collage using internet images. ACM Transactions on Graphics Vol. 30, No. 6, Article No. 155, 2011.
[70]
A. Y.-S. Chia,; S. Zhuo,; R. K. Gupta,; Y.-W. Tai,; S.-Y. Cho,; P. Tan,; S. Lin, Semantic colorization with internet images. ACM Transactions on Graphics Vol. 30, No. 6, Article No. 156, 2011.
[71]
J. He,; J. Feng,; X. Liu,; T. Cheng,; T.-H. Lin,; H. Chung,; S.-F. Chang, Mobile product search with bag of hash bits and boundary reranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3005-3012, 2012.
[72]
T. Chen,; P. Tan,; L.-Q. Ma,; M.-M. Cheng,; A. Shamir,; S.-M. Hu, PoseShop: Human image database construction and personalized content synthesis. IEEE Transactions on Visualization and Computer Graphics Vol. 19, No. 5, 824-837, 2013.
[73]
S.-M. Hu,; T. Chen,; K. Xu,; M.-M. Cheng,; R. R. Martin, Internet visual media processing: A survey with graphics and vision applications. The Visual Computer Vol. 29, No. 5, 393-405, 2013.
[74]
M.-M. Cheng,; N. J. Mitra,; X. Huang,; S.-M. Hu, SalientShape: Group saliency in image collections. The Visual Computer Vol. 30, No. 4, 443-453, 2014.
[75]
J. Carreira,; C. Sminchisescu, CPMC: Automatic object segmentation using constrained parametric min-cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 7, 1312-1328, 2012.
[76]
C. Lu,; S. Liu,; J. Jia,; C.-K. Tang, Contour box: Rejecting object proposals without explicit closed contours. In: Proceedings of the IEEE International Conference on Computer Vision, 2021-2029, 2015.
[77]
R.-E. Fan,; K.-W. Chang,; C.-J. Hsieh,; X.-R. Wang,; C.-J. Lin, LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research Vol. 9, 1871-1874, 2008.
[78]
J. P. Gottlieb,; M. Kusunoki,; M. E. Goldberg, The representation of visual salience in monkey parietal cortex. Nature Vol. 391, No. 6666, 481-484, 1998.
[79]
S. Hare,; A. Saffari,; P. H. S. Torr, Efficient online structured output learning for keypoint-based object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1894-1901, 2012.
[80]
S. Zheng,; P. Sturgess,; P. H. S. Torr, Approximate structured output learning for constrained local models with application to real-time facial feature detection and tracking on low-power devices. In: Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 1-8,2013.
[81]
P. Viola,; M. Jones, Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, I-I, 2001.
[82]
N. Chavali,; H. Agrawal,; A. Mahendru,; D. Batra, Object-proposal evaluation protocol is ‘gameable’. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 835-844, 2016.
[83]
K. Simonyan,; A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[84]
N. Dalal,; B. Triggs, Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, 886-893, 2005.
[85]
P. F. Felzenszwalb,; R. B. Girshick,; D. McAllester,; D. Ramanan, Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 9, 1627-1645, 2010.
[86]
J. Deng,; W. Dong,; R. Socher,; L.-J. Li,; K. Li,; L. Fei-Fei, ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248-255, 2009.
[87]
K. He,; X. Zhang,; S. Ren,; J. Sun, Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.
[88]
W. Kuo,; B. Hariharan,; J. Malik, DeepBox: Learning objectness with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2479-2487, 2015.
[89]
Z. Zhang,; Y. Liu,; X. Chen,; Y. Zhu,; M.-M. Cheng,; V. Saligrama,; P. H. Torr, Sequential optimization for efficient high-quality object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 5, 1209-1223, 2018.
[90]
W. Chen,; C. Xiong,; R. Xu,; J. J. Corso, Actionness ranking with lattice conditional ordinal random fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,748-755, 2014.
Computational Visual Media
Pages 3-20
Cite this article:
Cheng M-M, Liu Y, Lin W-Y, et al. BING: Binarized normed gradients for objectness estimation at 300fps. Computational Visual Media, 2019, 5(1): 3-20. https://doi.org/10.1007/s41095-018-0120-1
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return