AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (38.7 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Image Tagging by Semantic Neighbor Learning Using User-Contributed Social Image Datasets

School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China.
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China.
Show Author Information

Abstract

The explosive increase in the number of images on the Internet has brought with it the great challenge of how to effectively index, retrieve, and organize these resources. Assigning proper tags to the visual content is key to the success of many applications such as image retrieval and content mining. Although recent years have witnessed many advances in image tagging, these methods have limitations when applied to high-quality and large-scale training data that are expensive to obtain. In this paper, we propose a novel semantic neighbor learning method based on user-contributed social image datasets that can be acquired from the Web’s inexhaustible social image content. In contrast to existing image tagging approaches that rely on high-quality image-tag supervision, we acquire weak supervision of our neighbor learning method by progressive neighborhood retrieval from noisy and diverse user-contributed image collections. The retrieved neighbor images are not only visually alike and partially correlated but also semantically related. We offer a step-by-step and easy-to-use implementation for the proposed method. Extensive experimentation on several datasets demonstrates that the performance of the proposed method significantly outperforms others.

References

[1]
Li X., Uricchio T., Ballan L., Bertini M., Snoek C. G., and Bimbo A. D., Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval, ACM Computing Surveys, vol. 49, no. 14, pp. 1-14, 2015.
[2]
Nie L. Q., Yan S. C., Wang M., Hong R. C., and Chua T. S., Harvesting visual concepts for image search with complex queries, in Proc. ACM International Conference on Multimedia, Amsterdam, the Netherlands, 2012, pp. 59-68.
[3]
Liu A. A., Xu N., Nie W. Z., Su Y. T., Wong Y., and Kankanhalli M., Benchmarking a multimodal and multiview and interactive dataset for human action recognition, IEEE Transactions on Cybernetics, vol. 47, no. 7, pp. 1781-1794, 2016.
[4]
Nie L. Q., Wang M., Zha Z. J., Li G., and Chua T. S., Multimedia answering: Enriching text QA with media information, in Proc. 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, 2011, pp. 695-704.
[5]
Liu A. A., Su Y. T., Nie W. Z., and Kankanhalli M., Hierarchical clustering multi-task learning for joint human action grouping and recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 1, pp. 102-114, 2017.
[6]
Nie L. Q., Wang M., Zha Z. J., and Chua T. S., Oracle in image search: A content-based approach to performance prediction, ACM Transactions on Information Systems, vol. 30, no. 2, pp. 1-23, 2012.
[7]
Liu A. A., Su Y. T., Jia P. P., Gao Z., Hao T., and Yang Z. X., Multipe/single-view human action recognition via part-induced multitask structural learning, IEEE Transactions on Cybernetics, vol. 45, no. 6, pp. 1194-1208, 2015.
[8]
Deng J., Dong W., Socher R., Li L. J., Li K., and Fei-Fei L., Imagenet: A large-scale hierarchical image database, in Proc. IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 2009, pp. 248-255.
[9]
Vreeswijk D. T., Snoek C. G., van de Sande K. E., and Smeulders A. W., All vehicles are cars: Subclass preferences in container concepts, in Proc. the 2nd ACM International Conference on Multimedia Retrieval, New York, NY, USA, 2012, pp. 8-16.
[10]
Carneiro G., Chan A. B., Moreno P. J., and Vasconcelos N., Supervised learning of semantic classes for image annotation and retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 394-410, 2007.
[11]
Nguyen N. and Caruana R., Classification with partial labels, in Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 2008, pp. 551-559.
[12]
He X. and Zemel R. S., Learning hybrid models for image annotation with partially labeled data, Advances in Neural Information Processing Systems, vol. 21, no. 2, pp. 625-632, 2009.
[13]
Makadia A., Pavlovic V., and Kumar S., Baselines for image annotation, International Journal of Computer Vision, vol. 90, no. 1, pp. 88-105, 2010.
[14]
Guillaumin M., Mensink T., Verbeek J., and Schmid C., Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation, in Proc. the 12th International Conference on Computer Vision, New York, NY, USA, 2009, pp. 309-316.
[15]
Verbeek J., Guillaumin M., Mensink T., and Schmid C,, Image annotation with tagprop on the mirflickr set, in Proc. the ACM International Conference on Multimedia Information Retrieval, Pennsylvania, PA, USA, 2010, pp. 537-546.
[16]
Zhang S., Huang J., Huang Y., Yu Y., Li H., and Metaxas D. N., Automatic image annotation using group sparsity, in Proc. the IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 2010, pp. 3312-3319.
[17]
Fan J., Shen Y., Zhou N., and Gao Y., Harvesting large-scale weakly-tagged image databases from the web, in Proc. the IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 2010, pp. 802-809.
[18]
Bucak S. S., Jin R., and Jain A. K., Multi-label learning with incomplete class assignments, in Proc. the IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 2011, pp. 2801-2808.
[19]
Chen M., Zheng A. X., and Weinberger K. Q., Fast image tagging, in Proc. the 30th International Conference on Machine Learning, Atlanta, GA, USA, 2013, pp. 1274-1282.
[20]
Qin J., Liu X., and Lin H., Audio retrieval based on manifold raking and relevance feedback, Tsinghua Science and Technology, vol. 20, no.6, pp. 613-619, 2015.
[21]
Duygulu P., Barnard K., de Freitas J. F., and Forsyth D. A., Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary, in Proc. the 7th European Conference on Computer Vision, Copenhagen, Denmark, 2002, pp. 97-112.
[22]
Barnard K., Duygulu P., Forsyth D., Freitas N. D., Blei D. M., and Jordan M. I., Matching words and pictures, Journal of Machine Learning Research, vol. 3, no. 2, pp. 1107-1135, 2003.
[23]
Monay F. and Gatica-Perez D., PLSA-based image auto-annotation: Constraining the latent space, in Proc. the 12th Annual ACM International Conference on Multimedia, New York, NY, USA, 2004, pp. 348-351.
[24]
Yakhnenko O. and Honavar V., Annotating images and image objects using a hierarchical dirichlet process model, in Proc. the 9th International Workshop on Multimedia Data Mining, Las Vegas, NV, USA, 2008, pp. 1-7.
[25]
Socher R. and Fei-Fei L., Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora, in Proc. the 23rd IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010, pp. 966-973.
[26]
Chen L., Xu D., Tsang I. W., and Luo J., Tag-based image retrieval improved by augmented features and group-based refinement, IEEE Transactions on Multimedia, vol. 14, no. 4, pp. 1057-1067, 2003.
[27]
Li X. and Snoek C. G., Classifying tag relevance with relevant positive and negative examples, in Proc. the 21st ACM International Conference on Multimedia, Barcelona, Spain, 2013, pp. 485-488.
[28]
Zhou B., Jagadeesh V., and Piramuthu R., Concept-learner: Discovering visual concepts from weakly labeled image collections, in Proc. the 28th IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1492-1500.
[29]
Li X., Snoek C. G., and Worring M., Learning social tag relevance by neighbor voting, IEEE Transactions on Multimedia, vol. 11, no. 7, pp. 1310-1322, 2009.
[30]
Kennedy L., Slaney M., and Weinberger K., Reliable tags using image similarity: Mining specificity and expertise from large-scale multimedia databases, in Proc. the 1st Workshop on Web-Scale Multimedia Corpus, New York, NY, USA, 2009, pp. 17-24.
[31]
Truong B. Q., Sun A., and Bhowmick S. S., Content is still king: The effect of neighbor voting schemes on tag relevance for social image retrieval, in Proc. the 2nd ACM International Conference on Multimedia Retrieval, Vancouver, Canada, 2012, pp. 9-16.
[32]
Lee S., De Neve W., and Ro Y. M., Visually weighted neighbor voting for image tag relevance learning, Multimedia Tools and Applications, vol. 72, no. 2, pp. 1363-1386, 2014.
[33]
Zhu X., Nejdl W., and Georgescu M., An adaptive teleportation random walk model for learning social tag relevance, in Proc. the 37th International ACM SIGIR Conference on Research gyqzwz Development in Information Retrieval, Gold Coast, Australia, 2014, pp. 223-232.
[34]
Liu A. A., Nie W. Z., Gao Y., and Su Y. T., Multi-modal clique- graph matching for view-based 3D model retrieval, IEEE Transactions on Image Processing, vol. 25, no. 5, pp. 2103-2116, 2016.
[35]
Nie W. Z., Liu A. A., and Su Y., Cross-domain semantic transfer from large-scale social media, Multimedia Systems, vol. 22, no. 1, pp. 75-85, 2016.
[36]
Krizhevsky A., Sutskever I., and Hinton G. E., Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, vol. 25, no. 2, pp. 1097-1105, 2012.
[37]
Murthy V. N., Maji S., and Manmatha R., Automatic image annotation using deep learning representations, in Proc. the 5th ACM on International Conference on Multimedia Retrieval, New York, NY, USA, 2015, pp. 603-606.
[38]
Zhang H., Shang X., Luan H., Wang M., and Chua T. S., Learning from collective intelligence: Feature learning using social images and tags, ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 13, no. 1, pp. 1-15, 2016.
[39]
Shalev-Shwartz S., Singer Y., and Srebro N., Pegasos: Primal estimated sub-gradient solver for SVM, in Proc. the 24th ACM International Conference on Machine Learning, New York, NY, USA, 2007, pp. 807-814.
[40]
Saad Y. and Schultz M. H., GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM Journal on Scientific and Statistical Computing, vol. 7, no. 3, pp. 856-869, 1986.
[41]
Huiskes M. J. and Lew M. S., The MIR flickr retrieval evaluation, in Proc. the 1st ACM International Conference on Multimedia Information Retrieval, Vancouver, Canada, 2008, pp. 39-43.
Tsinghua Science and Technology
Pages 551-563
Cite this article:
Tian F, Shen X, Liu X, et al. Image Tagging by Semantic Neighbor Learning Using User-Contributed Social Image Datasets. Tsinghua Science and Technology, 2017, 22(6): 551-563. https://doi.org/10.23919/TST.2017.8195340

608

Views

19

Downloads

1

Crossref

N/A

Web of Science

1

Scopus

1

CSCD

Altmetrics

Received: 02 December 2016
Revised: 29 March 2017
Accepted: 25 May 2017
Published: 14 December 2017
© The author(s) 2017
Return