A Deep Learning Method for Chinese Singer Identification

Zebang Shen; Binbin Yong; Gaofeng Zhang; Rui Zhou; Qingguo Zhou

doi:10.26599/TST.2018.9010121

| Sign up

PDF (5.9 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Open Access

A Deep Learning Method for Chinese Singer Identification

Zebang Shen, Binbin Yong, Gaofeng Zhang, Rui Zhou, Qingguo Zhou()

School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China.

Show Author Information

Abstract

As a subfield of Multimedia Information Retrieval (MIR), Singer IDentification (SID) is still in the research phase. On one hand, SID cannot easily achieve high accuracy because the singing voice is difficult to model and always disturbed by the background instrumental music. On the other hand, the performance of conventional machine learning methods is limited by the scale of the training dataset. This study proposes a new deep learning approach based on Long Short-Term Memory (LSTM) and Mel-Frequency Cepstral Coefficient (MFCC) features to identify the singer of a song in large datasets. The results of this study indicate that LSTM can be used to build a representation of the relationships between different MFCC frames. The experimental results show that the proposed method achieves better accuracy for Chinese SID in the MIR-1K dataset than the traditional approaches.

Keywords

singer identification timbre modeling deep learning long short-term memory

References

[1]

S.,

Masood

J. S.,

Nayal

and R. K.

Jain

, Singer identification in Indian Hindi songs using MFCC and spectral features, in Proc. IEEE 1st Int. Conf. Power Electronics, Intelligent Control and Energy Systems, Delhi, India, 2016, pp. 1-5.

[2]

Dupraz

and G.

Richard

, Robust frequency-based audio fingerprinting, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Dallas, TX, USA, 2010, pp. 281-284.

[3]

Schindler

and A.

Rauber

, A music video information retrieval approach to artist identification, in Proc. 10th Int. Symp. Computer Music Multidisciplinary Research, Marseille, France, 2013.

[4]

W.,

Cai

, and X.

Guan

, Automatic singer identification based on auditory features, in Proc. 7th Int. Conf. Natural Computation, Shanghai, China, 2011, pp. 1624-1628.

[5]

H. A.,

Patil

P. G.

Radadia

, and T. K.

Basu

, Combining evidences from mel cepstral features and cepstral mean subtracted features for singer identification, in Proc. Int. Conf. Asian Language Processing, Hanoi, Vietnam, 2012, pp. 145-148.

[6]

B.,

Whitman

Flake

, and S.

Lawrence

, Artist detection in music with Minnowmatch, in Proc. IEEE Workshop on Neural Networks for Signal Processing, North Falmouth, MA, USA, 2001, pp. 559-568.

[7]

N. C.,

Maddage

C. S.

, and Y.

Wang

, Singer identification based on vocal and instrumental models, in Proc. 17th Int. Conf. Pattern Recognition, Cambridge, UK, 2004, pp. 375-378.

Crossref

[8]

Y. E.

Kim

and B.

Whitman

, Singer identification in popular music recordings using voice coding features, in Proc. 3rd Int. Conf. Music Information Retrieval, Paris, France, 2002, pp. 164-169.

[9]

G. E.

Hinton

and R. R.

Salakhutdinov

, Reducing the dimensionality of data with neural networks, Science, vol. 313, no. 5786, pp. 504-507, 2006.

Crossref Google Scholar

[10]

Y.,

LeCun

Bengio

, and G.

Hinton

, Deep learning, Nature, vol. 521, no. 7553, pp. 436-444, 2015.

Crossref Google Scholar

[11]

G.,

Hinton

L.,

Deng

D.,

G.,

Dahl

A. R.,

Mohamed

N.,

Jaitly

A.,

Senior

V.,

Vanhoucke

P.,

Nguyen

T.,

Sainath

et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.

Crossref Google Scholar

[12]

A.,

Graves

A. R.

Mohamed

, and G.

Hinton

, Speech recognition with deep recurrent neural networks, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Vancouver, Canada, 2013, pp. 6645-6649.

[13]

Z.,

Shen

B.,

Yong

G.,

Zhang

Zhou

, and Q.

Zhou

, A deep learning method for Chinese singer identification, in Sixth International Conference on Advanced Cloud and Big Data, Lanzhou, China, 2018.

[14]

Hochreiter

and J.

Schmidhuber

, Long short-term memory, Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.

Crossref Google Scholar

[15]

I.,

Goodfellow

Bengio

, and A.

Courville

, Deep Learning. Cambridge, MA, USA: MIT Press, 2016.

[16]

F. A.,

Gers

Schmidhuber

, and F.

Cummins

, Learning to forget: Continual prediction with LSTM, Neural Computation, vol. 12, no. 10, pp. 2451-2471, 2000.

Crossref Google Scholar

[17]

D. P.

Kingma

and J.

, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980, 2014.

Google Scholar

[18]

Mermelstein

, Distance measures for speech recognition, psychological and instrumental, in Pattern Recognition and Artificial Intelligence, R. C. H. Chen, ed. Academic Press, 1976, pp. 374-388.

[19]

Davis

and P.

Mermelstein

, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357-366, 1980.

Crossref Google Scholar

[20]

Sahidullah

and G.

Saha

, Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition, Speech Communication, vol. 54, no. 4, pp. 543-565, 2012.

Crossref Google Scholar

[21]

Zhang

, Automatic singer identification, in Proc. 2003 Int. Conf. Multimedia and Expo, Baltimore, MD, USA, 2003, pp. 1-33.

Crossref

[22]

and G. Z.

Liu

, Automatic singer identification using missing feature methods, in Proc. IEEE Int. Conf. Multimedia and Expo, San Jose, CA, USA, 2013, pp. 1-6.

Crossref

[23]

X.,

Glorot

Bordes

, and Y.

Bengio

, Deep sparse rectifier neural networks, in Proc. 14th Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2011, pp. 315-323.

[24]

N.,

Srivastava

G.,

Hinton

A.,

Krizhevsky

Sutskever

, and R.

Salakhutdinov

, Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.

Google Scholar

[25]

Prechelt

, Automatic early stopping using cross validation: Quantifying the criteria, Neural Networks, vol. 11, no. 4, pp. 761-767, 1998.

Crossref Google Scholar

[26]

Y. Z.,

Zhou

Zhang

, and N. X.

Xiong

, Post-cloud computing paradigms: A survey and comparison, Tsinghua Science and Technology, vol. 22, no. 6, pp. 714-732, 2017.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 24 Issue 4,
August 2019

Pages 371-378

DOI: 10.26599/TST.2018.9010121

Cite this article:

Shen Z, Yong B, Zhang G, et al. A Deep Learning Method for Chinese Singer Identification. Tsinghua Science and Technology, 2019, 24(4): 371-378. https://doi.org/10.26599/TST.2018.9010121