AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

Brain-inspired multimodal learning based on neural networks

Chang LiuFuchun Sun( )Bo Zhang
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Show Author Information

Abstract

Modern computational models have leveraged biological advances in human brain research. This study addresses the problem of multimodal learning with the help of brain-inspired models. Specifically, a unified multimodal learning architecture is proposed based on deep neural networks, which are inspired by the biology of the visual cortex of the human brain. This unified framework is validated by two practical multimodal learning tasks: image captioning, involving visual and natural language signals, and visual-haptic fusion, involving haptic and visual signals. Extensive experiments are conducted under the framework, and competitive results are achieved.

References

[1]
Riesenhuber M, Poggio T. Hierarchical models of object recognition in cortex. Nat Neurosci 1999, 2(11): 1019-1025.
[2]
Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 1962, 160(1): 106-154.
[3]
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE 1998, 86(11): 2278-2324.
[4]
Goodale MA, Milner AD. Separate visual pathways for perception and action. Trends Neurosci 1992, 15(1): 20-25.
[6]
Hu XL, Zhang JW, Li JM, Zhang B. Sparsity-regularized HMAX for visual recognition. PLOS One 2014, 9(1): e81813.
[7]
Dura-Bernal S, Wennekers T, Denham SL. The role of feedback in a hierarchical model of object perception. In From Brains to Systems. Hernández C, Sanz R, Gómez-Ramirez J, Smith LS, Hussain A, Chella A, Aleksander I, Eds. New York, NY: Springer, 2011, pp 165-179.
[8]
Casagrande VA. A third parallel visual pathway to primate area V1. Trends Neurosci 1994, 17(7): 305- 310.
[9]
Markov NT, Vezoli J, Chameau P, Falchier A, Quilodran R, Huissoud C, Lamy C, Misery P, Giroud P, Ullman S, Barone P, Dehay C, Knoblauch K, Kennedy H. Anatomy of hierarchy: Feedforward and feedback pathways in macaque visual cortex. J Comp Neurol 2014, 522(1): 225-259.
[10]
Murphy PC, Sillito AM. Corticofugal feedback influences the generation of length tuning in the visual pathway. Nature 1987, 329(6141): 727-729.
[11]
Casagrande VA. A third parallel visual pathway to primate area V1. Trends Neurosci 1994, 17(7): 305-310.
[12]
Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A. Real-time human pose recognition in parts from single depth images. In Proceedings of CVPR 2011, Colorado Springs, CO, USA, 2011, pp 1297-1304.
[13]
McMahan HB, Holt G, Sculley D, Young M, Ebner D, Grady J, Nie L, Phillips T, Davydov E, Golovin D, Chikkerur S, Liu D, Wattenberg M, Hrafnkelsson AM, Boulos T, Kubica J. Ad click prediction: A view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, 2013, pp 1222-1230.
[14]
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015, 521(7553): 436-444.
[15]
Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge, MA: MIT Press, 2016.
[16]
Lahat D, Adali T, Jutten C. Multimodal data fusion: An overview of methods, challenges, and prospects. Proc IEEE 2015, 103(9): 1449-1477.
[17]
Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS. Multimodal fusion for multimedia analysis: A survey. Multimed Syst 2010, 16(6): 345-379.
[18]
McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 1943, 5(4): 115-133.
[19]
Dai JF, Li Y, He KM, Sun J. R-FCN: Object detection via region-based fully convolutional networks. In Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016.
[20]
Sutskever I, Vinyals O, Le OV. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014.
[21]
Hodosh M, Young P, Hockenmaier J. Framing image description as a ranking task: Data, models and evaluation metrics. J Artif Intell Res 2013, 47: 853- 899.
[22]
Jia X, Gavves E, Fernando B, Tuytelaars T. Guiding the long-short term memory model for image caption generation. In Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015.
[23]
Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell: A neural image caption generator. In Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015.
[24]
Karpathy A, Li FF. Deep visual-semantic alignments for generating image descriptions. In Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015.
[25]
Mao JH, Xu W, Yang Y, Wang J, Yuille AL. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090, 2014.
[25]
Mao JH, Xu W, Yang Y, Wang J, Huang ZH, Yuille A. Deep captioning with multimodal recurrent neural networks (m-RNN). arXiv preprint arXiv:1412.6632, 2015.
[26]
Chu V, McMahon I, Riano L, McDonald CG, He Q, Perez-Tejada JM, Arrigo M, Darrell T, Kuchenbecker KJ. Robotic learning of haptic adjectives through physical interaction. Rob Auton Syst 2015, 63: 279- 292.
[27]
Gao Y, Hendricks LA, Kuchenbecker KJ, Darrell T. Deep learning for tactile understanding from visual and haptic data. In Proceedings of 2016 IEEE International Conference on Robotics and Automation, Stockholm, Sweden, 2016, pp 536-543.
Brain Science Advances
Pages 61-72
Cite this article:
Liu C, Sun F, Zhang B. Brain-inspired multimodal learning based on neural networks. Brain Science Advances, 2018, 4(1): 61-72. https://doi.org/10.26599/BSA.2018.9050004

718

Views

39

Downloads

9

Crossref

Altmetrics

Received: 15 July 2018
Revised: 06 August 2018
Accepted: 10 August 2018
Published: 25 November 2018
© The authors 2018

This article is published with open access at journals.sagepub.com/home/BSA

Creative Commons Non Commercial CC BY-NC: This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/ en-us/nam/open-access-at-sage).

Return