TKLNDST, College of Computer Science, Nankai University, Tianjin, China.
University of Massachusetts Amherst, Amherst, MA, USA.
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China.
Show Author Information
Hide Author Information
Abstract
Detecting and segmenting salient objects from natural scenes, often referred to as salient object detection, has attracted great interest in computer vision. While many models have been proposed and several applications have emerged, a deep understandingof achievements and issues remains lacking. We aim to provide a comprehensive review of recent progress in salient object detection and situate this field among other closely related areas such as generic scene segmentation, object proposal generation, and saliency for fixation prediction. Covering 228 publications, wesurvey i) roots, key concepts, and tasks, ii) core techniques and main modeling trends, and iii) datasets and evaluation metrics for salient object detection. We also discuss open problems such as evaluation metrics and dataset bias in model performance, and suggest future research directions.
Berg,A. C.; Berg,T. L.; Daume,H.; Dodge,J.; Goyal,A.; Han,X.; Mensch,A.; Mitchell,M.; Sood,A.; Stratos,K. et al.Understanding and predicting importance in images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3562-3569, 2012.
M’t Hart,B. M.; Schmidt,H. C. E. F.; Roth,C.; Einhäuser,W.Fixations on objects in natural scenes: Dissociating importance from salience. Frontiers in Psychology Vol. 4, 455, 2013.
Isola,P.; Xiao,J.; Torralba,A.; Oliva,A.What makes an image memorable? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 145-152, 2011.
Katti,H.; Bin,K. Y.; Chua,T. S.; Kankanhalli,M.Preattentive discrimination of interestingness in images. In: Proceedings of the IEEE International Conference on Multimedia and Expo, 1433-1436, 2008.
Gygli,M.; Grabner,H.; Riemenschneider,H.; Nater,F.; Van Gool,L.The interestingness of images. In: Proceedings of the IEEE International Conference on Computer Vision, 1633-1640, 2013.
Dhar,S.; Ordonez,V.; Berg,T. L.High level describable attributes for predicting aesthetics and interestingness. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1657-1664, 2011.
Jiang,Y.-G.; Wang,Y.; Feng,R.; Xue,X.; Zheng,Y.; Yang,H.Understanding and predicting interestingness of videos. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence, 2013.
[13]
Itti,L.; Baldi,P.Bayesian surprise attracts human attention. In: Proceedings of the 18th International Conference on Neural Information Processing Systems, 547-554, 2005.
[14]
Wang,Z.; Bovik,A. C.; Sheikh,H. R.; Simoncelli,E. P.Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing Vol. 13, No. 4, 600-612, 2004.
Wang,Z.; Bovik,A. C.; Lu,L.Why is image quality assessment so difficult? In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, IV-3313–IV-3316, 2002.
Zhang,W.; Borji,A.; Wang,Z.; Le Callet,P.; Liu,H. T.The application of visual saliency models in objective image quality assessment: A statistical evaluation. IEEE Transactions on Neural Networks and Learning Systems Vol. 27, No. 6, 1266-1278, 2016.
Ehinger,K. A.; Xiao,J.; Torralba,A.; Oliva,A.Estimating scene typicality from human ratings and image features. In: Proceedings of the Annual Cognitive Science Conference, 2011.
[19]
Farhadi,A.; Endres,I.; Hoiem,D.; Forsyth,D.Describing objects by their attributes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1778-1785, 2009.
Liu,H. Y.; Jiang,S. Q.; Huang,Q. M.; Xu,C. S.; Gao,W.Region-based visual attention analysis with its application in image browsing on small displays. In: Proceedings of the 15th ACM International Conference on Multimedia, 305-308, 2007.
Li,Y.; Hou,X.; Koch,C.; Rehg,J. M.; Yuille,A. L.The secrets of salient object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 280-287, 2014.
Borji,A.What is a salient object? A dataset and a baseline model for salient object detection. IEEE Transactions on Image Processing Vol. 24, No. 2, 742-756, 2015.
Itti,L.; Koch,C.; Niebur,E.A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 20, No. 11, 1254-1259, 1998.
Liu,T.; Sun,J.; Zheng,N.; Tang,X.; Shum,H.-Y.Learning to detect a salient object. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-8, 2007.
Perazzi,F.; Krahenbuhl,P.; Pritch,Y.; Hornung,A.Saliency filters: Contrast based filtering for salient region detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 733-740, 2012.
Cheng,M.-M.; Zhang,Z.; Lin,W.-Y.; Torr,P. H. S.BING: Binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3286-3293, 2014.
Borji,A.; Tavakoli,H. R.; Sihite,D. N.; Itti,L.Analysis of scores, datasets, and models in visual saliency prediction. In: Proceedings of the IEEE International Conference on Computer Vision, 921-928, 2013.
Alexe,B.; DeselaersT.; Ferrari,V.What is an object? In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 73-80, 2010.
Siva,P.; Russell,C.; Xiang,T.; Agapito,L.Looking beyond the image: Unsupervised learning for object saliency and detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3238-3245, 2013.
Achanta,R.; Hemami,S.; Estrada,F.; Süsstrunk,S.Frequency-tuned salient region detection. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 1597-1604, 2009.
Cheng,M.-M.; Zhang,G.-X.; Mitra,N. J.; Huang,X.; Hu,S.-M.Global contrast based salient region detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 409-416, 2011.
Yan,Q.; Xu,L.; Shi,J.; Jia,J.Hierarchical saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1155-1162, 2013.
Wang,L.; Lu,H.; Ruan,X.; Yang,M.-H.Deep networks for saliency detection via local estimation and global search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3183-3192, 2015.
Zhao,R.; Ouyang,W.; Li,H.; Wang,X.Saliency detection by multi-context deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1265-1274, 2015.
Li,G.; Yu,Y.Visual saliency based on multiscale deep features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5455-5463, 2015.
[48]
Zou,W.; Komodakis,N.HARF: Hierarchy-associated rich features for salient object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 406-414, 2015.
Hou,Q.; Cheng,M.-M.; Hu,X.; Borji,A.; Tu,Z.; Torr,P. H. S.Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3203-3212, 2017.
Wolfe,J. M.; Cave,K. R.; Franzel,S. L.Guided search: An alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance Vol. 15, No. 3, 419-433, 1989.
Koch,C.; Ullman,S.Shifts in selective visual attention: Towards the underlying neural circuitry. In: Matters of Intelligence. Synthese Library (Studies in Epistemology, Logic, Methodology, and Philosophy of Science), Vol. 188.Vaina,L. M.Ed. Springer Dordrecht, 115-141, 1987.
[53]
Parkhurst,D.; Law,K.; Niebur,E.Modeling the role of salience in the allocation of overt visual attention. Vision Research Vol. 42, No. 1, 107-123, 2002.
Bruce,N. D. B.; Tsotsos,J. K.Saliency based on information maximization. In: Proceedings of the 18th International Conference on Neural Information Processing Systems, 155-162, 2005.
[55]
Liu,T.; Yuan,Z. J.; Sun,J.; Wang,J. D.; Zheng,N. N.; Tang,X. O.; Shum,H.-Y.Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 33, No. 2, 353-367, 2011.
Achanta,R.; Estrada,F.; Wils,P.; Süsstrunk,S.Salient region detection and segmentation. In:Computer Vision Systems. Lecture Notes in Computer Science, Vol. 5008.Gasteratos,A.; Vincze,M.; Tsotsos,J. K.Eds. Springer Berlin Heidelberg, 66-75, 2008.
[57]
Ma,Y.-F.; Zhang,H.-J.Contrast-based image attention analysis by using fuzzy growing. In: Proceedings of the 11th ACM International Conference on Multimedia, 374-381, 2003.
[58]
Liu,F.; Gleicher,M.Region enhanced scale-invariant saliency detection. In: Proceedings of the IEEE International Conference on Multimedia and Expo, 1477-1480, 2006.
Judd,T.; Ehinger,K.; Durand,F.; Torralba,A.Learning to predict where humans look. In: Proceedings of the IEEE 12th International Conference on Computer Vision, 2106-2113, 2009.
Hou,X.; Zhang,L.Saliency detection: A spectral residual approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-8, 2007.
Borji,A.; Itti,L.Exploiting local and global patch rarities for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 478-485, 2012.
Borji,A.Boosting bottom–up and top–down visual features for saliency estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 438-445, 2012.
Viola,P.; Jones,M.Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2001.
[68]
Felzenszwalb,P. F.; Girshick,B.; McAllester,D.; Ramanan,D.Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 9, 1627-1645, 2010.
Long,J.; Shelhamer,E.; Darrell,T.Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431-3440, 2015.
Hua,G.; Liu,Z. C.; Zhang,Z. Y.; Wu,Y.Iterative local-global energy minimization for automatic extraction of objects of interest. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 28, No. 10, 1701-1706, 2006.
Ko,B. C.; Nam,J.-Y.Automatic object-of-interest segmentation from natural images. In: Proceedings of the 18th International Conference on Pattern Recognition, 45-48, 2006.
[73]
Allili,M. S.; Ziou,D.Object of interest segmentation and tracking by using feature selection and active contours. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-8,2007.
Hu,Y.; Rajan,D.; Chia,L.-T.Robust subspace analysis for detecting visual attention regions in images. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, 716-724, 2005.
Valenti,R.; Sebe,N.; Gevers,T.Image saliency by isocentric curvedness and color. In: Proceedings of the IEEE 12th International Conference on Computer Vision, 2185-2192, 2009.
Klein,D. A.; Frintrop,S.Center-surround divergence of feature statistics for salient object detection. In: Proceedings of the International Conference on Computer Vision, 2214-2219, 2011.
Li,X.; Li,Y.; Shen,C.; Dick,A. R.; van den Hengel,A.Contextual hypergraph modeling for salient object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 3328-3335, 2013.
Margolin,R.; Tal,A.; Zelnik-Manor,L.What makes a patch distinct? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1139-1146, 2013.
Yu,Z. W.; Wong,H. S.A rule based technique for extraction of visual attention regions based on real-time clustering. IEEE Transactions on Multimedia Vol. 9, No. 4, 766-784, 2007.
Cheng,M. M.; Mitra,N. J.; Huang,X. L.; Torr,P. H. S.; Hu,S. M.Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 569-582, 2015.
Scharfenberger,C.; Wong,A.; Fergani,K.; Zelek,J. S.; Clausi,D. A.Statistical textural distinctiveness for salient region detection in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 979-986, 2013.
Cheng,M.-M.; Warrell,J.; Lin,W.-Y.; Zheng,S.; Vineet,V.; Crook,N.Efficient salient region detection with soft image abstraction. In: Proceedings of the IEEE International Conference on Computer Vision, 1529-1536, 2013.
Jiang,Z.; Davis,L. S.Submodular salient region detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2043-2050, 2013.
Yu,H. N.; Li,J.; Tian,Y. H.; Huang,T. J.Automatic interesting object extraction from images using complementary saliency maps. In: Proceedings of the International Conference on Multimedia, 891-894, 2010.
[91]
Lu,Y.; Zhang,W.; Lu,H.; Xue,X.Salient object detection using concavity context. In: Proceedings of the International Conference on Computer Vision, 233-240, 2011.
[92]
Chang,K.-Y.; Liu,T.-L.; Chen,H.-T.; Lai,S.-H.Fusing generic objectness and visual saliency for salient object detection. In: Proceedings of the International Conference on Computer Vision, 914-921, 2011.
[93]
Jiang,H.; Wang,J.; Yuan,Z.; Liu,T.; Zheng,N.Automatic salient object segmentation based on context and shape prior. In: Proceedings of the British Machine Vision Conference, 2011.
Shen,X.; Wu,Y.A unified approach to salient object detection via low rank matrix recovery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 853-860, 2012.
Yang,C.; Zhang,L.; Lu,H.; Ruan,X.; Yang,M.-H.Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3166-3173, 2013.
Li,X.; Lu,H.; Zhang,L.; Ruan,X.; Yang,M.-H.Saliency detection via dense and sparse reconstruction. In: Proceedings of the IEEE International Conference on Computer Vision, 2976-2983, 2013.
Jiang,B.; Zhang,L.; Lu,H.; Yang,C.; Yang,M.-H.Saliency detection via absorbing Markov chain. In: Proceedings of the IEEE International Conference on Computer Vision, 1665-1672, 2013.
Jiang,P.; Ling,H.; Yu,J.; Peng,J.Salient region detection by UFO: Uniqueness, focusness and objectness. In: Proceedings of the IEEE International Conference on Computer Vision, 1976-1983, 2013.
Jia,Y.; Han,M.Category-independent object-level saliency detection. In: Proceedings of the IEEE International Conference on Computer Vision, 1761-1768, 2013.
Zhu,W.; Liang,S.; Wei,Y.; Sun,J.Saliency optimization from robust background detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2814-2821, 2014.
Zhang,J.; Sclaroff,S.Saliency detection: A Boolean map approach. In: Proceedings of the IEEE International Conference on Computer Vision, 153-160, 2013.
Li,N.; Ye,J.; Ji,Y.; Ling,H.; Yu,J.Saliency detection on light field. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2806-2813, 2014.
Mehrani,P.; Veksler,O.Saliency segmentation based on learning and graph cut refinement. In: Proceedings of the British Machine Vision Conference, 110.1-110.12. 2010.
Lu,S.; Mahadevan,V.; Vasconcelos,N.Learning optimal seeds for diffusion-based salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2790-2797, 2014.
Kim,J.; Han,D.; Tai,Y.-W.; Kim,J.Salient region detection via high-dimensional color transform. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 883-890, 2014.
Marchesotti,L.; Cifarelli,C.; Csurka,G.A framework for visual saliency detection with applications to image thumbnailing. In: Proceedings of the IEEE 12th International Conference on Computer Vision, 2232-2239, 2009.
Wang,M.; Konrad,J.; Ishwar,P.; Jing,K.; Rowley,H.Image saliency: From intrinsic to extrinsic context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 417-424, 2011.
Mai,L.; Niu,Y.; Liu,F.Saliency aggregation: A datadriven approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1131-1138, 2013.
Zhai,Y.; Shah,M.Visual attention detection in video sequences using spatiotemporal cues. In: Proceedings of the 14th ACM International Conference on Multimedia, 815-824, 2006.
Liu,T.; Zheng,N.; Ding,W.; Yuan,Z.Video attention: Learning to detect a salient object sequence. In: Proceedings of the 19th International Conference on Pattern Recognition, 1-4, 2008.
Li,Y.; Sheng,B.; Ma,L. Z.; Wu,W.; Xie,Z. F.Temporally coherent video saliency using regional dynamic contrast. IEEE Transactions on Circuits and Systems for Video Technology Vol. 23, No. 12, 2067-2076, 2013.
Chang,K.; Liu,T.; Lai,S.From co-saliency to co-segmentation: An efficient and fully unsupervised energy minimization model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2129-2136, 2011.
Niu,Y.; Geng,Y.; Li,X.; Liu,F.Leveraging stereopsis for saliency analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 454-461, 2012.
[123]
Desingh,K.; Krishna,K. M.; Rajan,D.; Jawahar,C. V.Depth really matters: Improving visual salient region detection with depth. In: Proceedings of the British Machine Vision Conference, 2013.
Rother,C.; Minka,T. P.; Blake,A.; Kolmogorov,V.Cosegmentation of image pairs by histogram matching-incorporating a global constraint into MRFs. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 993-1000, 2006.
[125]
Batra,D.; Kowdle,A.; Parikh,D.; LuoJ.; Chen,T.iCoseg: Interactive co-segmentation with intelligent scribble guidance. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 3169-3176, 2010.
Mukherjee,L.; Singh,V.; Peng,J.Scale invariant cosegmentation for image groups. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1881-1888, 2011.
Kim,G.; Xing,E. P.; Li,F. F.; Kanade,T.Distributed cosegmentation via submodular optimization on anisotropic diffusion. In: Proceedings of the International Conference on Computer Vision Barcelona, 169-176, 2011.
[128]
Feng,J.; Wei,Y.; Tao,L.; Zhang,C.; Sun,J.Salient object detection by composition. In: Proceedings of the International Conference on Computer Vision, 1028-1035, 2011.
Wang,P.; Wang,J.; Zeng,G.; Feng,J.; Zha,H.; Li,S.Salient object detection for searched web images via global saliency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, 3194-3201, 2012.
[130]
Wang,L.; Xue,J.; Zheng,N.; Hua,G.Automatic salient object extraction with contextual cue. In: Proceedings of the International Conference on Computer Vision, 105-112, 2011.
Tu,Z. W.; Bai,X.Auto-context and its application to high-level vision tasks and 3D brain image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 10, 1744-1757, 2010.
Qin,Y.; Lu,H. C.; Xu,Y. Q.; Wang,H.Saliency detection via cellular automata. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 110-119, 2015.
Krizhevsky,A.; Sutskever,I.; Hinton,G. E.ImageNet classification with deep convolutional neural networks. Communications of the ACM Vol. 60, No. 6, 84-90, 2017.
Lee,G.; Tai,Y.-W.; Kim,J.Deep saliency with encoded low level distance map and high level features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 660-668, 2016.
Kim,J.; Pavlovic,V.A shape preserving approach for salient object detection using convolutional neural networks. In: Proceedings of the 23rd International Conference on Pattern Recognition, 609-614, 2016.
Wang,X.; Ma,H. M.; Chen,X. Z.Salient object detection via fast R-CNN and low-level cues. In: Proceedings of the IEEE International Conference on Image Processing, 1042-1046, 2016.
Kim,J.; Pavlovic,V.A shape preserving approach for salient object detection using convolutional neural networks. In: Proceedings of the 23rd International Conference on Pattern Recognition, 609-614, 2016.
Liu,N.; Han,J.DHSNet: Deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 678-686, 2016.
Li,G. B.; Yu,Y. Z.Deep contrast learning for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 478-487, 2016.
[152]
Simonyan,K.; Zisserman,A.Very deep convolutional networks for large-scale image recognition. arXiv preprintarXiv:1409.1556, 2014.
Kuen,J.; Wang,Z. H.; Wang,G.Recurrent attentional networks for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3668-3677, 2016.
Kruthiventi,S. S. S.; Gudisa,V.; Dholakiya,J. H.; Babu,R. V.Saliency unified: A deep architecture for simultaneous eye fixation prediction and salient object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5781-5790, 2016.
Zhang,J.; Dai,Y. C.; Porikli,F.Deep salient object detection by integrating multi-level cues. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 1-10, 2017.
Wang,L. J.; Lu,H. C.; Ruan,X.; Yang,M. H.Deep networks for saliency detection via local estimation and global search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3183-3192, 2015.
Girshick,R.; Donahue,J.; Darrell,T.; Malik,J.Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 580-587, 2014.
Uijlings,J. R. R.; van de Sande,K. E. A.; Gevers,T.; Smeulders,A. W. M.Selective search for object recognition. International Journal of Computer Vision Vol. 104, No. 2, 154-171, 2013.
Lee,C.-Y.; Xie,S.; Gallagher,P.; Zhang,Z.; Tu,Z.Deeply-supervised nets. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, 562-570, 2015.
[165]
Jaderberg,M.; Simonyan,K.; Zisserman,A.; Kavukcuoglu,K.Spatial transformer networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 2, 2017-2025, 2015.
[166]
Achanta,R.; Shaji,A.; Smith,K.; Lucchi,A.; Fua,P.; Süsstrunk,S.SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 11, 2274-2282, 2012.
Krähenbühl,P.; Koltun,V.Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, 109-117, 2011.
[170]
Russakovsky,O.; Deng,J.; Su,H.; Krause,J.; Satheesh,S.; Ma,S. A.; Huang,Z.; Karpathy,A.; Khosla,A.; Bernstein,M.et al.ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, No. 3, 211-252, 2015.
Oquab,M.; Bottou,L.; Laptev,I.; Sivic,J.Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1717-1724, 2014.
Zhu,J.; Wu,J.; Wei,Y.; Chang,E.; Tu,Z.Unsupervised object class discovery via saliency-guided multiple class learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3218-3225, 2012.
[177]
Chen,T.; Cheng,M.-M.; Tan,P.; Shamir,A.; Hu,S.-M.Sketch2Photo: Internet image montage. ACM Transactions on Graphics Vol. 28, No. 5, Article No. 124, 2009.
Rutishauser,U.; Walther,D.; Koch,C.; PeronaP.Is bottom–up attention useful for object recognition? In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, II, 2004.
[181]
Kanan,C.; Cottrell,G.Robust classification of objects, faces, and flowers using natural image statistics. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2472-2479, 2010.
Moosmann,F.; Larlus,D.; Jurie,F.Learning saliency maps for object categorization. In: Proceedings of the International Workshop on the Representation and Use of Prior Knowledge in Vision, 2006.
[183]
Borji,A.; Ahmadabadi,M. N.; Araabi,B. N.Cost-sensitive learning of top-down modulation for attentional control. Machine Vision and Applications Vol. 22, No. 1, 61-76, 2011.
Borji,A.; Itti,L.Scene classification with a sparse set of salient regions. In: Proceedings of the IEEE International Conference on Robotics and Automation, 1902-1908, 2011.
Shen,H.; Li,S. X.; Zhu,C. F.; Chang,H. X.; Zhang,J. L.Moving object detection in aerial video based on spatiotemporal saliency. Chinese Journal of Aeronautics Vol. 26, No. 5, 1211-1217, 2013.
Ren,Z. X.; Gao,S. H.; Chia,L. T.; Tsang,I. W. H.Region-based saliency detection and its application in object recognition. IEEE Transactions on Circuits and Systems for Video Technology Vol. 24, No. 5, 769-779, 2014.
Guo,C. L.; Zhang,L. M.A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Transactions on Image Processing Vol. 19, No. 1, 185-198, 2010.
Itti,L.Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Transactions on Image Processing Vol. 13, No. 10, 1304-1318, 2004.
Ma,Y. F.; Hua,X. S.; Lu,L.; Zhang,H. J.A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia Vol. 7, No. 5, 907-919, 2005.
Lee,Y. J.; Ghosh,J.; Grauman,K.Discovering important people and objects for egocentric video summarization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1346-1353, 2012.
[191]
Ji,Q. G.; Fang,Z. D.; Xie,Z. H.; Lu,Z. M.Video abstraction based on the visual attention model and online clustering. Signal Processing: Image Communication Vol. 28, No. 3, 241-253, 2013.
Wang,J.; Quan,L.; Sun,J.; Tang,X.; Shum,H.-Y.Picture collage. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 347-354, 2006.
[194]
Ninassi,A.; Le Meur,O.; Le Callet,P.; Barba,D.Does where you gaze on an image affect your perception of quality? Applying visual attention to image quality metric. In: Proceedings of the IEEE International Conference on Image Processing, II169-II172, 2007.
Liu,H. T.; Heynderickx,I.Studying the added value of visual attention in objective image quality metrics based on eye movement data. In: Proceedings of the 16th IEEE International Conference on Image Processing, 3097-3100, 2009.
Li,A.; She,X.; Sun,Q.Color image quality assessment combining saliency and FSIM. In: Proceedings of the SPIE 8878, 5th International Conference on Digital Image Processing, 88780I, 2013.
Donoser,M.; Urschler,M.; Hirzer,M.; Bischof,H.Saliency driven total variation segmentation. In: Proceedings of the IEEE 12th International Conference on Computer Vision, 817-824, 2009.
Johnson-Roberson,M.; Bohg,J.; Björkman,M.; Kragic,D.Attention-based active 3D point cloud segmentation. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 1165-1170, 2010.
Feng,S. H.; Xu,D.; Yang,X.Attention-driven salient edge(s) and region(s) extraction with application to CBIR. Signal Processing Vol. 90, No. 1, 1-15,2010.
Li,J.; Levine,M. D.; An,X. J.; Xu,X.; He,H. G.Visual saliency based on scale-space analysis in the frequency domain. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 35, No. 4, 996-1010, 2013.
García,G. M.; Klein,D. A.; Stückler,J.; Frintrop,S.; Cremers,A. B.Adaptive multi-cue 3D tracking of arbitrary objects. In: Pattern Recognition. Lecture Notes in Computer Science, Vol. 7476.Pinz,A.; Pock,T.; Bischof,H.; Leberl,F.Eds. Springer Berlin Heidelberg, 357-366, 2012.
[207]
Borji,A.; Frintrop,S.; Sihite,D. N.; Itti,L.Adaptive object tracking by learning background context. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 23-30, 2012.
Klein,D. A.; Schulz,D.; Frintrop,S.; Cremers,A. B.Adaptive real-time video-tracking for arbitrary objects. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 772-777, 2010.
Karpathy,A.; Miller,S.; Li,F-F.Object discovery in 3D scenes via shape analysis. In: Proceedings of the IEEE International Conference on Robotics and Automation, 2088-2095, 2013.
Frintrop,S.; Garcıa,G. M.; Cremers,A. B.A cognitive approach for object discovery. In: Proceedings of the 22nd International Conference on Pattern Recognition, Stockholm, 2329-2334, 2014.
Sugano,Y.; Matsushita,Y.; Sato,Y.Calibration-free gaze sensing using saliency maps. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2667-2674, 2010.
Alpert,S.; Galun,M.; Basri,R.; Brandt,A.Image segmentation by probabilistic bottom–up aggregation and cue integration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-8, 2007.
Movahedi,V.; Elder,J. H.Design and perceptual validation of performance measures for salient object segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, 49-56, 2010.
Brown,M.; Süsstrunk,S.Multi-spectral SIFT for scene category recognition. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 177-184, 2011.
Li,J.; Tian,Y.; Huang,T.; Gao,W.A dataset and evaluation methodology for visual saliency in video. In: Proceedings of the IEEE International Conference on Multimedia and Expo, 442-445, 2009.
[226]
Wu,Y.; Zheng,N. N.; Yuan,Z. J.; Jiang,H. Z.; Liu,T.Detection of salient objects with focused attention based on spatial and temporal coherence. Chinese Science Bulletin Vol. 56, No. 10, 1055-1062,2011.
He,K.; Zhang,X.; Ren,S.; SunJ.Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.
TorralbaA.; Efros,A. A.Unbiased look at dataset bias. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1521-1528, 2011.
Tatler,B. W.The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision Vol. 7, No. 14, 4, 2007.
Borji A, Cheng M-M, Hou Q, et al. Salient object detection: A survey. Computational Visual Media, 2019, 5(2): 117-150. https://doi.org/10.1007/s41095-019-0149-9
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from thecopyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.
10.1007/s41095-019-0149-9.F1
An example image in Borji et al. ’s experiment [
26
] along with annotated salient objects. Dots represent 3-second free-viewing fixations.
10.1007/s41095-019-0149-9.F2
Sample results produced by different models. Left to right: input image, salient object detection [
27
], fixation prediction [
24
], image segmentation (regions with various sizes) [
28
], image segmentation (superpixels with comparable sizes) [
29
], and object proposals (true positives) [
30
].
10.1007/s41095-019-0149-9.F3
A simplified chronicle of salient object detection modeling. The first wave started with the Itti et al. model [
24
], followed by the second wave with the introduction of the approach of Liu et al. [
25
] who were the first to define saliency as a binary segmentation problem. The third wave started with the surge of deep learning models and the model of Li and Yu [
47
].
10.1007/s41095-019-0149-9.F4
Popular FCN-based architectures. Apart from the classical architecture (a), more and more advanced architectures have been developed recently. Some of them (b–e) exploit skip layers from different scales so as to learn multi-scale and multi-level features. Some (e, g–i) adopt an encoder–decoder structure to better fuse high-level features with low-level ones. Others (f, g, i) introduce side supervision as in Ref. [
142
] in order to capture more detailed multi-level information. See Table 5 for details of these architectures.
10.1007/s41095-019-0149-9.F5
Visual comparisons of two best classic methods (DRFI and DSR), according to Ref. [
132
], and two leading CNN-based methods (MDF and DSS).
10.1007/s41095-019-0149-9.F6
Sample applications of salient object detection.
10.1007/s41095-019-0149-9.T1Salient object detection models with intrinsic cues (sorted by year). Elements: {PI = pixel, PA = patch, RE = region}, where prefixes m and h indicate multi-scale and hierarchical versions, respectively. Hypothesis: {CP = center prior, G = global contrast, L = local contrast, D = edge density, B = background prior, F = focus prior, O = objectness prior, CV = convexity prior, CS = center-surround contrast, CLP = color prior, SD = spatial distribution, BC = boundary connectivity prior, SPS = sparse noise}. Aggregation/optimization: {LN = linear, NL = non-linear, AD = adaptive, HI = hierarchical, BA = Bayesian, GMRF = Gaussian MRF, EM = energy minimization, and LS = least-square solver}. Code: {M= Matlab, C= C/C++, NA = not available, EXE = executable}
#
Model
Pub
Year
Elements
Hypothesis
Aggregation (optimization)
Code
Uniqueness
Prior
1
FG [57]
MM
2003
PI
L
—
—
NA
2
RSA [74]
MM
2005
PA
G
—
—
NA
3
RE [58]
ICME
2006
mPI+RE
L
—
LN
NA
4
RU [83]
TMM
2007
RE
—
P
LN
NA
5
AC [56]
ICVS
2008
mPA
L
—
LN
NA
6
FT [37]
CVPR
2009
PI
CS
—
—
C
7
ICC [77]
ICCV
2009
PI
L
—
LN
NA
8
EDS [76]
PR
2009
PI
—
ED
—
NA
9
CSM [90]
MM
2010
PI+PA
L
SD
—
NA
10
RC [84]
CVPR
2011
RE
G
—
—
C
11
HC [84]
CVPR
2011
RE
G
—
—
C
12
CC [91]
ICCV
2011
mRE
—
CV
—
NA
13
CSD [78]
ICCV
2011
mPA
CS
—
LN
NA
14
SVO [92]
ICCV
2011
PA+RE
CS
O
EM
M+C
15
CB [93]
BMVC
2011
mRE
L
CP
LN
M+C
16
SF [27]
CVPR
2012
RE
G
SD
NL
C
17
ULR [94]
CVPR
2012
RE
SPS
CP+CLP
—
M+C
18
GS [95]
ECCV
2012
PA/RE
—
B
—
NA
19
LMLC [96]
TIP
2013
RE
CS
—
BA
M+C
20
HS [42]
CVPR
2013
hRE
G
—
HI
EXE
21
GMR [97]
CVPR
2013
RE
—
B
—
M
22
PISA [89]
CVPR
2013
RE
G
SD+CP
NL
NA
23
STD [85]
CVPR
2013
RE
G
—
—
NA
24
PCA [80]
CVPR
2013
PA+PE
G
—
NL
M+C
25
GU [86]
ICCV
2013
RE
G
—
—
C
26
GC [86]
ICCV
2013
RE
G
SD
AD
C
27
CHM [79]
ICCV
2013
PA+mRE
CS+L
—
LN
M+C
28
DSR [98]
ICCV
2013
mRE
—
B
BA
M+C
29
MC [99]
ICCV
2013
RE
—
B
—
M+C
30
UFO [100]
ICCV
2013
RE
G
F+O
NL
M+C
31
CIO [101]
ICCV
2013
RE
G
O
GMRF
NA
32
SLMR [102]
BMVC
2013
RE
SPS
BC
—
NA
33
LSMD [103]
AAAI
2013
RE
SPS
CP+CLP
—
NA
34
SUB [87]
CVPR
2013
RE
G
CP+CLP+SD
—
NA
35
PDE [104]
CVPR
2014
RE
—
CP+B+CLP
—
NA
36
RBD [105]
CVPR
2014
RE
—
BC
LS
M
10.1007/s41095-019-0149-9.T2Salient object detection models with extrinsic cues grouped by their adopted cues. For cues: {GT = ground-truth annotation, SI = similar images, TC = temporal cues, SCO = saliency co-occurrence, DP = depth, and LF = light field}. For saliency hypothesis: {P = generic properties, PRA = pre-attention cues, HD = discriminativity in high-dimensional feature space, SS = saliency similarity, CMP = complement of saliency cues, SP = sampling probability, MCO = motion coherence, RP = repeatedness, RS = region similarity, C = corresponding, and DK = domain knowledge}. Others: {CRF = conditional random field, SVM = support vector machine, BDT = boosted decision tree, and RF = random forest}
10.1007/s41095-019-0149-9.T4CNN-based salient object detection models and information used by them during training. Above: CCN-based models. Below: FCN-based models
#
Model
Pub
Year
#Training images
Training set
Pre-trained model
Fully conv
1
SuperCNN [44]
IJCV
2015
800
ECSSD
—
✗
2
LEGS [45]
CVPR
2015
3,340
MSRA-B+PASCALS
—
✗
3
MC [46]
CVPR
2015
8,000
MSRA10K
GoogLeNet [143]
✗
4
MDF [47]
CVPR
2015
2,500
MSRA-B
—
✗
5
HARF [48]
ICCV
2015
2,500
MSRA-B
—
✗
6
ELD [144]
CVPR
2016
nearly 9,000
MSRA10K
VGGNet
✗
7
SSD-HS [145]
ECCV
2016
2,500
MSRA-B
AlexNet
✗
8
FRLC [146]
ICIP
2016
4,000
DUT-OMRON
VGGNet
✗
9
SCSD-HS [147]
ICPR
2016
2,500
MSRA-B
AlexNet
✗
10
DISC [148]
TNNLS
2016
9,000
MSRA10K
—
✗
11
LCNN [149]
Neuro
2017
2,900
MSRA-B+PASCALS
AlexNet
✗
12
DHSNET [150]
CVPR
2016
6,000
MSRA10K
VGGNet
✓
13
DCL [151]
CVPR
2016
2,500
MSRA-B
VGGNet [152]
✓
14
RACDNN [153]
CVPR
2016
10,565
DUT+NJU2000+RGBD
VGG
✓
15
SU [154]
CVPR
2016
10,000
MSRA10K
VGGNet
✓
16
CRPSD [155]
ECCV
2016
10,000
MSRA10K
VGGNet
✓
17
DSRCNN [156]
MM
2016
10,000
MSRA10K
VGGNet
✓
18
DS [157]
TIP
2016
nearly 10,000
MSRA10K
VGGNet
✓
19
IMC [158]
WACV
2017
nearly 6,000
MSRA10K
ResNet
✓
20
MSRNet [159]
CVPR
2017
2,500
MSRA-B+HKU-IS
VGGNet
✓
21
DSS [49]
CVPR
2017
2,500
MSRA-B
VGGNet
✓
10.1007/s41095-019-0149-9.T5Different types of information leveraged by existing FCN-based models. Abbreviations: SP: superpixel, SS: side supervision, RCL: recurrent convolutional layer, PCF: pure CNN feature, IL: instance-level, Arch: architecture
#
Model
SP
SS
RCL
PCF
IL
CRF
Arch.
1
DCL [151]
✓
✓
✗
✓
✗
✓
Fig. 4(b)
2
CRPSD [155]
✓
✗
✗
✗
✗
✗
Fig. 4(c)
3
DSRCNN [156]
✗
✓
✓
✓
✗
✗
Fig. 4(f)
4
DHSNET [150]
✗
✓
✓
✓
✗
✗
Fig. 4(g)
5
RACDNN [153]
✗
✗
✓
✓
✗
✗
Fig. 4(h)
6
SU [154]
✗
✓
✗
✓
✗
✓
Fig. 4(d)
7
DS [157]
✓
✗
✗
✗
✗
✗
Fig. 4(a)
8
IMC [158]
✓
✗
✗
✗
✗
✗
Fig. 4(a)
9
MSRNet [159]
✓
✗
✗
✓
✓
✓
Fig. 4(h)
10
DSS [49]
✗
✓
✗
✓
✗
✓
Fig. 4(i)
10.1007/s41095-019-0149-9.T6Overview of popular salient object datasets. Above: image datasets, below: video datasets. Obj: objects per image, Ann: Annotation, Sbj: Subjects/Annotators, Eye: Eye tracking subjects, I/V: Image/Video