An Impact of Narrowband Speech Codec Mismatch on a Performance of GMM-UBM Speaker Recognition over Telecommunication Channel

doi:10.26552/com.C.2016.1.23-28

Communications - Scientific Letters of the University of Zilina 2016, 18(1):23-28 | DOI: 10.26552/com.C.2016.1.23-28

An Impact of Narrowband Speech Codec Mismatch on a Performance of GMM-UBM Speaker Recognition over Telecommunication Channel

Jozef Polacky¹, Peter Pocta¹, Roman Jarina¹: ¹ Department of Telecommunications and Multimedia, Faculty of Electrical Engineering, University of Zilina, Slovakia

The automatic identification of person's identity from their voice is a part of modern telecommunication services. In order to execute the identification task, speech signal has to be transmitted to a remote server. So a performance of the recognition/identification system can be influenced by various distortions that occur when transmitting speech signal through a communication channel. This paper studies an effect of telecommunication channel, particularly commonly used narrowband (NB) speech codecs in current telecommunication networks, on a performance of automatic speaker recognition in the context of a channel/codec mismatch between enrollment and test utterances. An influence of speech coding on speaker identification is assessed by using the reference GMM-UBM method. The results show that the partially mismatched scenario offers better results than the fully matched scenario when speaker recognition is done on speech utterances degraded by the different NB codecs. Moreover, deploying EVS and G.711 codecs in a training process of the recognition system provides the best success rate in the fully mismatched scenario. It should be noted here that the both EVS and G.711codecs offer the best speech quality among the codecs deployed in this study. This finding also fully corresponds with the finding presented by Janicki & Staroszczyk in [1] focusing on other speech codecs.

Keywords: speaker identification; GMM-UBM; MFCC features; TIMIT; speech codecs; narrowband voice transmission

Published: February 29, 2016 Show citation

Polacky, J., Pocta, P., & Jarina, R. (2016). An Impact of Narrowband Speech Codec Mismatch on a Performance of GMM-UBM Speaker Recognition over Telecommunication Channel. Communications - Scientific Letters of the University of Zilina, 18(1), 23-28. doi: 10.26552/com.C.2016.1.23-28

Share...

Download citation

Open full article

References

JANICKI, A., STAROSZCZYK, T.: Speaker Recognition from Coded Speech Using Support Vector Machines, TSD 2011, LNAI 6836, pp. 291-298, 2011. Go to original source...
BHATTACHARJEE, U., SARMAH, K.: GMM-UBM Based Speaker Verification in Multilingual Environments, IJCSI Intern. J. of Computer Science Issues, vol. 9, No. 6, November 2012.
ASBAI, N., AMROUCHE, A., DEBYECHE, M.: Performances Evaluation of GMM-UBM and GMM-SVM for Speaker Recognition in Realistic World, ICONIP 2011, Part II, LNCS 7063, pp. 284-291, 2011. Go to original source...
PILLAY, S. G., ARIYAEEINIA, A., PAWLEWSKI, M., SIVAKUMARAN, P. Speaker Verification under Mismatched Data Conditions, IET Signal Processing, vol. 3, No. 4, 2009, pp. 236-246. Go to original source...
FAKHR, W., ABDELSALAM, A., HAMDY, N.: Enhancement of Mismatched Conditions in Speaker Recognition for Multimedia Applications, ICASSP, vol. 1, pp. 377-80, 2004. Go to original source...
QUATIERI, T. F., SINGER, E., DUNN, R. B., REYNOLDS, D. A., CAMPBELL, J. P.: Speaker and Language Recognition Using Speech Codec Parameters, Eurospeech, vol. 2, pp. 787-790, 1999. Go to original source...
GALLARDO, L. F., WAGNER, M., MOLLER, S.: I-vector Speaker Verification for Speech Degraded by Narrowband and Wideband Channels, ITG-Fachbericht 252: Speech Communication, Erlangen, September 2014.
3GPP: EVS Codec Detailed Algorithmic Description, Third Generation Partnership Project, 3GPP TS 26.445, 2014.
REYNOLDS, D. A., QUATIERI, T. F., DUNN, R. B.: Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Processing, vol. 10, No. 1-3, 2000, pp. 19-41. Go to original source...
BECKER, T., JESSEN, M., GRIGORAS, C.: Forensic Speaker Verification Using Formant Features and Gaussian Mixture Models. Interspeech 2008, pp. 1505-1508. Go to original source...
SORDO MARTINEZ, P. L., FAUVE, B., LARCHER, A., MASON, J. S.: Speaker Verification Performance with Constrained Durations. Intern. Workshop on Biometrics and Forensics (IWBF), 2014, pp. 1-6. Go to original source...
TOGNERI, R., PULLELLA, D.: An Overview of Speaker Identification: Accuracy and Robustness Issues. Circuits and Systems Magazine, IEEE, 11(2), 2011, 23-61. Go to original source...
REYNOLDS, D. A., ROSE, R.: Robust Text-independent Speaker Identification Using Gaussian Mixture Speakers Models, IEEE Trans. On Speech and Audio Processing 3, 1995, pp. 72-83. Go to original source...
BISHOP, C.: Pattern Recognition and Machine Learning, Springer Science+Business Media, LLC : New York, 2006.
LINDE, Y., BUZO, A., GRAY, R.: An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications 28, 1980, pp. 84-95. Go to original source...
KINNUNEM, T., LI, H. An Overview of Text-Independent Speaker Recognition: From Features to Supervectors, Speech Communication, 2009 Go to original source...
GAROFOLO, J., LAMEL, J. et al.: DARPA, TIMIT Acoustic-Phonetic Continuous Speech Corpus CD-ROM. National Institute of Standards and Technology, 1990.
RAJESHWARA, R. R., PRASAD, A., KEDARI RAO, CH.: Robust Features for Automatic Text-Independent Speaker Recognition Using Gaussian Mixture Model, Intern. J. of Soft Computing and Engineering (IJSCE), vol. 1, No. 5, November 2011.
ITU: Pulse Code Modulation (PCM) of Voice Frequencies, Intern. Telecommunication Union : Geneva, ITU-T Rec. G.711, 1988.
ITU: Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-exited linear prediction (CS-ACELP), Intern. Telecommunication Union : Geneva, ITU-T Rec. G.729, 2007.
3GPP: Mandatory Speech CODEC Speech Processing Functions; AMR speech Codec; General description, Third Generation Partnership Project, 3GPP TS 26.071, 2012.
REYNOLDS, D. A.: An Overview of Automatic Speaker Recognition Technology, IEEE, 2002. Go to original source...
POLACKY, J., GUOTH, I.: Comparative Evaluation of GMM and GMM/UBM Speaker Identification Systems, Proc. of intern. conference TRANSCOM 2015, University of Zilina, June 2015.

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.

Return to the content