AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (5.3 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Performance of Text-Independent Automatic Speaker Recognition on a Multicore System

Faculty of Tech and Software Engineering, University of Europe for Applied Sciences, Potsdam 14469, Germany
Show Author Information

Abstract

This paper studies a high-speed text-independent Automatic Speaker Recognition (ASR) algorithm based on a multicore system’s Gaussian Mixture Model (GMM). The high speech is achieved using parallel implementation of the feature’s extraction and aggregation methods during training and testing procedures. Shared memory parallel programming techniques using both OpenMP and PThreads libraries are developed to accelerate the code and improve the performance of the ASR algorithm. The experimental results show speed-up improvements of around 3.2 on a personal laptop with Intel i5-6300HQ (2.3 GHz, four cores without hyper-threading, and 8 GB of RAM). In addition, a remarkable 100% speaker recognition accuracy is achieved.

References

[1]
T. Kinnunen and H. Li, An overview of text-independent speaker recognition: From features to supervectors, Speech Commun., vol. 52, no. 1, pp. 1240, 2010.
[2]
D. A. Reynolds, Automatic speaker recognition using Gaussian mixture speaker models, Lincoln Lab. J., vol. 8, no. 2, pp. 173191, 1995.
[3]
R. Auckenthaler, E. S. Parris, and M. J. Caray, Improving a GMM speaker verification system by phonetic weighting, in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Phoenix, AZ, USA, 1999, pp. 313316.
[4]
A. Janicki and S. Biay, Improving GMM-based speaker recognition using trained voice activity detection, https://www.researchgate.net/publication/268290565_Improving_GMM-based_Speaker_Recognition_Using_Trained_Voice_Activity_Detection, 2006.
[5]
D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, Speaker verification using adapted Gaussian mixture models, Digital Signal Processing, vol. 10, nos. 1–3, pp. 1941, 2000.
[6]
F. Ganjeizadeh, H. Lei, A. Maganito, and G. Pallipatta, Reducing the computational complexity of the GMM-UBM speaker recognition approach, Int. J. Eng. Res. Technol., vol. 3, no. 3, pp. 17931797, 2014.
[7]
R. Makhijani, U. Shrawankar, and V. M. Thakare, Opportunities & challenges in automatic speech recognition, arXiv preprint arXiv:1305.2846, 2013.
[8]
M. Petracca, A. Servetti, and J. C. De Martin, Low-complexity automatic speaker recognition in the compressed GSM AMR domain, in Proc. IEEE Int. Conf. Multimedia and Expo, Amsterdam, the Netherlands, 2005, p. 4.
[9]
E. Gonina, G. Friedland, H. Cook, and K. Keutzer, Fast speaker diarization using a high-level scripting language, in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Waikoloa, HI, USA, 2011, pp. 553558.
[10]
D. A. Reynolds and R. C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process., vol. 3, no. 1, pp. 7283, 1995.
[11]
T. Yoshimura, T. Fujimoto, K. Oura, and K. Tokuda, SPTK4: An open-source software toolkit for speech signal processing, presented at Proc. 12th Speech Synthesis Workshop, Grenoble, France, 2023.
[12]
P. Pacheco, An Introduction to Parallel Programming. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2011.
[13]
IEEE Standard for Information Technology: Portable Operating System Interface (POSIX), https://pubs.opengroup.org/onlinepubs/009695399/, 2022.
[14]
F. Bimbot, J. F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-García, D. Petrovska-Delacrétaz, and D. A. Reynolds, A tutorial on text-independent speaker verification, EURASIP J. Adv. Signal Process., vol. 2004, pp. 430451, 2004.
[15]
R. N. Bracewell, The Fourier Transform and Its Applications. New York, NY, USA: McGraw-Hill, 1965.
[16]
L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, in Proc. IEEE, vol. 77, no. 2, pp. 257286, 1989.
[17]
J. Vanek, J. Trmal, J. V. Psutka, and J. Psutka, Optimization of the Gaussian mixture model evaluation on GPU, in Proc. Interspeech 2011, Florence, Italy, 2011, pp. 17371740.
[18]
G. Friedland, J. Chong, and A. Janin, Parallelizing speaker-attributed speech recognition for meeting browsing, in Proc. IEEE Int. Symp. Multimedia, Taichung, China, 2010, pp. 121128.
[19]
A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. Englewood Cliffs, NJ, USA: Prentice-Hall, 1989.
[20]
W. J. J. Roberts and J. P. Willmore, Automatic speaker recognition using Gaussian mixture models, in Proc. Information,   Decision and Control Data and Information Fusion Symp., Signal Processing and Communications Symp. and Decision and Control Symp., Adelaide, Australia, 1999, pp. 465470.
[21]
D. Reynolds, Gaussian mixture models, in Encyclopedia of Biometrics, S. Z. Li and A. Jain, eds. New York, NY, USA: Springer, 2009, pp. 659663.
[22]
F. Pernkopf and D. Bouchaffra, Genetic-based EM algorithm for learning Gaussian mixture models, IEEE Trans. Pattern Anal. Machine Intell., vol. 27, no. 8, pp. 13441348, 2005.
[23]
G. M. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, AFIPS Conf. Proc., vol. 30, pp. 483485, 1967.
[24]
C. M. Bishop, Pattern Recognition and Machine Learning. New York, NY, USA: Springer, 2006.
[25]
C. E. Leiserson and I. B. Mirman, How to Survive the Multicore Software Revolution (or at Least Survive the Hype). Burlington, MA, USA: CILK Arts, Inc., 2008.
[26]
G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun, Internally deterministic parallel algorithms can be fast, in Proc. 17th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, New Orleans, LA, USA, 2012, pp. 181192.
[27]
L. Dagum and R. Menon, OpenMP: An industry standard API for shared-memory programming, IEEE Comput. Sci. Eng., vol. 5, no. 1, pp. 4655, 1998.
[28]
C. Pheatt, Intel® threading building blocks, J. Comput. Sci. Coll., vol. 23, no. 4, p. 298, 2008.
[29]
Z. Budimlić, V. Cavé, R. Raman, J. Shirako, S. Taşırlar, J. Zhao, and V. Sarkar, The design and implementation of the habanero-java parallel programming language, in Proc. ACM Int. Conf. Companion on Object Oriented Programming Systems Languages and Applications Companion, Portland, OR, USA, 2011, pp. 185&186.
[30]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar, X10: An object-oriented approach to non-uniform cluster computing, in Proc. 20th Annu. ACM SIGPLAN Conf. Object-Oriented it Programming, Systems, Languages, and Applications, San Diego, CA, USA, 2005, pp. 519538.
[31]
R. D. Blumofe and C. E. Leiserson, Scheduling multithreaded computations by work stealing, J. ACM, vol. 46, no. 5, pp. 720748, 1999.
[32]
R. J. Anderson and L. Snyder, A comparison of shared and nonshared memory models of parallel computation, in Proc. IEEE, vol. 79, no. 4, pp. 480487, 1991.
[33]
M. Andersch, C. C. Chi, and B. Juurlink, Using OpenMP superscalar for parallelization of embedded and consumer applications, in Proc. Int. Conf. Embedded Computer Systems, Samos, Greece, 2012, pp. 2332.
[34]
J. Arndt, Algorithms for programmers ideas and source code, http://www.jjj.de/fxt/, 2015.
[35]
M. J. Quinn, Parallel Programming in C with MPI and OpenMP. Boston, MA, USA: McGraw-Hill Higher Education, 2004.
[36]
OpenMP: Application Programming. Interface. Version 4.5 November 2015, https://pubs.opengroup.org/onlinepubs/009695399/, 2022.
[37]
W. P. Petersen and P. Arbenz, Introduction to Parallel Computing: A Practical Guide with Examples in C. Oxford, UK: Oxford University Press, 2004.
[38]
K. R. Rao and P. Yip, Discrete Cosine Transform: Algorithms, Advantages, Applications. San Diego, CA, USA: Academic Press, 1990.
[39]
N. S. Disc, J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, Acoustic-phonetic continuous speech corpus, https://catalog.ldc.upenn.edu/LDC93s1, 2022.
[40]
J. Vanĕk, J. Trmal, J. V. Psutka, and J. Psutka, Full covariance Gaussian mixture models evaluation on GPU, in Proc. IEEE Int. Symp. Signal Processing and Information Technology, Ho Chi Minh City, Vietnam, 2012, pp. 203207.
[41]
L. Lu, A. Ghoshal, and S. Renals, Acoustic data-driven pronunciation lexicon for large vocabulary speech recognition, in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, 2013, pp. 374379.
Tsinghua Science and Technology
Pages 447-456
Cite this article:
Kouatly R, Khan TA. Performance of Text-Independent Automatic Speaker Recognition on a Multicore System. Tsinghua Science and Technology, 2024, 29(2): 447-456. https://doi.org/10.26599/TST.2023.9010018

497

Views

33

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 26 September 2022
Revised: 10 March 2023
Accepted: 18 March 2023
Published: 22 September 2023
© The author(s) 2024.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return