AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.7 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Publishing Language: Chinese

Based on audio-video evoked auditory attention detection electroencephalogram dataset

Hongyu ZHANG1Jingjing ZHANG1Xingguang DONG1Zhao LÜ1Jianhua TAO2Jian ZHOU1Xiaopei WU1Cunhang FAN1( )
Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei 230000, China
Department of Automation, Tsinghua University, Beijing 100084, China
Show Author Information

Abstract

Objective

Deep learning technology is actively explored in auditory attention detection tasks based on electroencephalogram (EEG) signals. However, past research in this area mainly focused on the sensory domain of human hearing, and relatively few studies investigated the effect of vision on auditory attention. In addition, mature public datasets like KUL and DTU are commonly used; however, they contain only EEG data and audio data, while in daily life, people's auditory attention is usually accompanied by visual information. To more comprehensively study people's auditory attention in a combined audio-visual state, this work integrates EEG, audio, and video data to conduct auditory attention detection studies.

Methods

To simulate a real-world perceptual environment, this paper constructs an audio-video EEG dataset to realize an in-depth exploration of auditory attention. The dataset contains two stimulus scenarios: audio-video and audio. In the audio-video stimulus scenario, subjects pay attention to the voice corresponding to the speaker in the video and ignore the voice of the other speaker; that is, subjects receive visual and auditory information input simultaneously. In the audio stimulus scenario, subjects focus on only one of the two speaker voices, i.e., the subjects receive only auditory input. Based on the EEG data of subjects in the above two scenarios, this paper verifies and compares the effectiveness of this dataset through existing methods.

Results

The results show the following: 1) Under various decision windows, the average accuracy of receiving only audio stimuli was significantly higher than that of receiving audio-video stimuli. Under a 2-s decision window, the detection performance of audio-video stimuli and audio stimuli reached only 70.5% and 75.2%, respectively. 2) Through experiments on EEG signals of various frequency bands in the two public datasets and the audio-video EEG datasets constructed in this paper, the detection performance of the gamma frequency band in the DTU dataset and audio-video scenario was better than other bands. In the KUL dataset, the detection performance of the alpha frequency band was higher than that of other bands. In the audio-only scenario, although the average classification accuracy of the 2-s decision window in the alpha frequency band was lower than that in the theta frequency band, it was still higher than that in other bands.

Conclusions

This paper proposes an audio-video EEG dataset that simulates the real scene more closely. Through experiments, it is found that in the audio-video stimulation scenario, the subjects need to process two sensory information simultaneously, which distracts their attention and leads to performance degradation. In addition, EEG signals in the alpha and gamma frequency bands carry important information when performing auditory spatial attention. Compared with the existing public auditory attention detection datasets, the audio-video EEG dataset proposed in this paper introduces video information and simulates the real scene more realistically. This dataset design provides richer modal information for the research and application of the brain-computer interface. This information is helpful for the deep study of auditory attention patterns and neural mechanisms of people under simultaneous stimulation of audio-visual information and has important research and application significance. This paper is expected to promote further research and application in auditory attention. This dataset is publicly available at http://iiphci.ahu.edu.cn/toAuditoryAttentionEnglish.

CLC number: TP392 Document code: A Article ID: 1000-0054(2024)11-1919-08

References

[1]

CHERRY E C. Some experiments on the recognition of speech, with one and with two ears[J]. The Journal of the Acoustical Society of America, 1953, 25(5): 975-979.

[2]

HUANG Y T, SHI J, XU J M, et al. Research advances and perspectives on the cocktail party problem and related auditory models[J]. Acta Automatica Sinica, 2019, 45(2): 234-251. (in Chinese)

[3]

CICCARELLI G, NOLAN M, PERRICONE J, et al. Comparison of two-talker attention decoding from EEG with nonlinear neural networks and linear methods[J]. Scientific Reports, 2019, 9(1): 11538.

[4]

PUFFAY C, ACCOU B, BOLLENS L, et al. Relating EEG to continuous speech using deep neural networks: A review[J]. Journal of Neural Engineering, 2023, 20(4): 041003.

[5]

GEIRNAERT S, VANDECAPPELLE S, ALICKOVIC E, et al. Electroencephalography-based auditory attention decoding: Toward neurosteered hearing devices[J]. IEEE Signal Processing Magazine, 2021, 38(4): 89-102.

[6]

CHEN X G, CHEN J J, LIU B C, et al. Application of brain-computer interface technology based on EEG in medical field[J]. Artificial Intelligence VIEW, 2021 (6): 6-14. (in Chinese)

[7]

MESGARANI N, CHANG E F. Selective cortical representation of attended speaker in multi-talker speech perception[J]. Nature, 2012, 485(7397): 233-236.

[8]

DING N, SIMON J Z. Neural coding of continuous speech in auditory cortex during monaural and dichotic listening[J]. Journal of Neurophysiology, 2012, 107(1): 78-89.

[9]

O'SULLIVAN J A, POWER A J, MESGARANI N, et al. Attentional selection in a cocktail party environment can be decoded from single-trial EEG[J]. Cerebral Cortex, 2015, 25(7): 1697-1706.

[10]

KURUVILA I, MUNCKE J, FISCHER E, et al. Extracting the auditory attention in a dual-speaker scenario from EEG using a joint CNN-LSTM model[J]. Frontiers in Physiology, 2021, 12: 700655.

[11]

SU E Z, CAI S Q, XIE L H, et al. STAnet: A spatiotemporal attention network for decoding auditory spatial attention from EEG[J]. IEEE Transactions on Biomedical Engineering, 2022, 69(7): 2233-2242.

[12]

FAGHIHI F, CAI S Q, MOUSTAFA A A. A neuroscience-inspired spiking neural network for EEG-based auditory spatial attention detection[J]. Neural Networks, 2022, 152: 555-565.

[13]
CAI S Q, SU P C, SCHULTZ T, et al. Low-latency auditory spatial attention detection based on spectro-spatial features from EEG[C]// Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society. Piscataway, USA: IEEE Press, 2021: 5812-5815.
[14]
DAS N, FRANCART T, BERTRAND A. Auditory attention detection dataset KULeuven (1.1.0)[DB/OL]. (2019-08-30)[2023-12-21]. https://doi.org/10.5281/zenodo.3997352.
[15]
FUGLSANG S A, WONG D D E, HJORTKJæR J. EEG and audio dataset for auditory attention decoding (version 1)[DB/OL]. (2018-03-15)[2023-12-21]. https://doi.org/10.5281/zenodo.1199011.
[16]

WONG D D E, FUGLSANG S A, HJORTKJæR J, et al. A comparison of regularization methods in forward and backward models for auditory attention decoding[J]. Frontiers in Neuroscience, 2018, 12: 531.

[17]

JIANG Y F, CHEN N, JIN J. Detecting the locus of auditory attention based on the spectro-spatial-temporal analysis of EEG[J]. Journal of Neural Engineering, 2022, 19(5): 056035.

Journal of Tsinghua University (Science and Technology)
Pages 1919-1926
Cite this article:
ZHANG H, ZHANG J, DONG X, et al. Based on audio-video evoked auditory attention detection electroencephalogram dataset. Journal of Tsinghua University (Science and Technology), 2024, 64(11): 1919-1926. https://doi.org/10.16511/j.cnki.qhdxxb.2024.26.024

138

Views

9

Downloads

0

Crossref

0

Scopus

0

CSCD

Altmetrics

Received: 21 December 2023
Published: 15 November 2024
© Journal of Tsinghua University (Science and Technology). All rights reserved.
Return