| Sign up

PDF (2.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Research Article | Open Access

Brain-inspired multimodal learning based on neural networks

Chang Liu, Fuchun Sun(), Bo Zhang

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Show Author Information

Abstract

Modern computational models have leveraged biological advances in human brain research. This study addresses the problem of multimodal learning with the help of brain-inspired models. Specifically, a unified multimodal learning architecture is proposed based on deep neural networks, which are inspired by the biology of the visual cortex of the human brain. This unified framework is validated by two practical multimodal learning tasks: image captioning, involving visual and natural language signals, and visual-haptic fusion, involving haptic and visual signals. Extensive experiments are conducted under the framework, and competitive results are achieved.

Keywords

multimodal learning brain-inspired learning deep learning neural networks

References

[1]

Riesenhuber

M

, Poggio

T

. Hierarchical models of object recognition in cortex. Nat Neurosci 1999, 2(11): 1019-1025.

[2]

Hubel

DH

, Wiesel

TN

. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 1962, 160(1): 106-154.

[3]

LeCun

Y

, Bottou

L

, Bengio

Y

, Haffner

P

. Gradient-based learning applied to document recognition. Proc IEEE 1998, 86(11): 2278-2324.

[4]

Goodale

MA

, Milner

AD

. Separate visual pathways for perception and action. Trends Neurosci 1992, 15(1): 20-25.

[5]

Bethopedia. http://wiki.bethanycrane.com/printer--friendly//introducingtheeye.

[6]

Hu

XL

, Zhang

JW

, Li

JM

, Zhang

B

. Sparsity-regularized HMAX for visual recognition. PLOS One 2014, 9(1): e81813.

[7]

Dura-Bernal

S

, Wennekers

T

, Denham

SL

. The role of feedback in a hierarchical model of object perception. In From Brains to Systems. Hernández

C

, Sanz

R

, Gómez-Ramirez

J

, Smith

LS

, Hussain

A

, Chella

A

, Aleksander

I

, Eds. New York, NY: Springer, 2011, pp 165-179.

[8]

Casagrande

VA

. A third parallel visual pathway to primate area V1. Trends Neurosci 1994, 17(7): 305- 310.

[9]

Markov

NT

, Vezoli

J

, Chameau

P

, Falchier

A

, Quilodran

R

, Huissoud

C

, Lamy

C

, Misery

P

, Giroud

P

, Ullman

S

, Barone

P

, Dehay

C

, Knoblauch

K

, Kennedy

H

. Anatomy of hierarchy: Feedforward and feedback pathways in macaque visual cortex. J Comp Neurol 2014, 522(1): 225-259.

[10]

Murphy

PC

, Sillito

AM

. Corticofugal feedback influences the generation of length tuning in the visual pathway. Nature 1987, 329(6141): 727-729.

[11]

Casagrande

VA

. A third parallel visual pathway to primate area V1. Trends Neurosci 1994, 17(7): 305-310.

[12]

Shotton

J

, Fitzgibbon

A

, Cook

M

, Sharp

T

, Finocchio

M

, Moore

R

, Kipman

A

, Blake

A

. Real-time human pose recognition in parts from single depth images. In Proceedings of CVPR 2011, Colorado Springs, CO, USA, 2011, pp 1297-1304.

[13]

McMahan

HB

, Holt

G

, Sculley

D

, Young

M

, Ebner

D

, Grady

J

, Nie

L

, Phillips

T

, Davydov

E

, Golovin

D

, Chikkerur

S

, Liu

D

, Wattenberg

M

, Hrafnkelsson

AM

, Boulos

T

, Kubica

J

. Ad click prediction: A view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, 2013, pp 1222-1230.

[14]

LeCun

Y

, Bengio

Y

, Hinton

G

. Deep learning. Nature 2015, 521(7553): 436-444.

[15]

Goodfellow

I

, Bengio

Y

, Courville

A

. Deep Learning. Cambridge, MA: MIT Press, 2016.

[16]

Lahat

D

, Adali

T

, Jutten

C

. Multimodal data fusion: An overview of methods, challenges, and prospects. Proc IEEE 2015, 103(9): 1449-1477.

[17]

Atrey

PK

, Hossain

MA

, El Saddik

A

, Kankanhalli

MS

. Multimodal fusion for multimedia analysis: A survey. Multimed Syst 2010, 16(6): 345-379.

[18]

McCulloch

WS

, Pitts

W

. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 1943, 5(4): 115-133.

[19]

Dai

JF

, Li

Y

, He

KM

, Sun

J

. R-FCN: Object detection via region-based fully convolutional networks. In Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016.

[20]

Sutskever

I

, Vinyals

O

, Le

OV

. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014.

[21]

Hodosh

M

, Young

P

, Hockenmaier

J

. Framing image description as a ranking task: Data, models and evaluation metrics. J Artif Intell Res 2013, 47: 853- 899.

[22]

Jia

X

, Gavves

E

, Fernando

B

, Tuytelaars

T

. Guiding the long-short term memory model for image caption generation. In Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015.

[23]

Vinyals

O

, Toshev

A

, Bengio

S

, Erhan

D

. Show and tell: A neural image caption generator. In Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015.

[24]

Karpathy

A

, Li

FF

. Deep visual-semantic alignments for generating image descriptions. In Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015.

[25]

Mao JH, Xu W, Yang Y, Wang J, Yuille AL. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090, 2014.

[25]

Mao JH, Xu W, Yang Y, Wang J, Huang ZH, Yuille A. Deep captioning with multimodal recurrent neural networks (m-RNN). arXiv preprint arXiv:1412.6632, 2015.

[26]

Chu

V

, McMahon

I

, Riano

L

, McDonald

CG

, He

Q

, Perez-Tejada

JM

, Arrigo

M

, Darrell

T

, Kuchenbecker

KJ

. Robotic learning of haptic adjectives through physical interaction. Rob Auton Syst 2015, 63: 279- 292.

[27]

Gao

Y

, Hendricks

LA

, Kuchenbecker

KJ

, Darrell

T

. Deep learning for tactile understanding from visual and haptic data. In Proceedings of 2016 IEEE International Conference on Robotics and Automation, Stockholm, Sweden, 2016, pp 536-543.

Brain Science Advances

Volume 4 Issue 1,
September 2018

Pages 61-72

DOI: 10.26599/BSA.2018.9050004

Cite this article:

Liu C, Sun F, Zhang B. Brain-inspired multimodal learning based on neural networks. Brain Science Advances, 2018, 4(1): 61-72. https://doi.org/10.26599/BSA.2018.9050004

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号