Speech recognition using hidden markov model with low redundancy in the observation space

doi:10.26552/com.C.2004.4.17-21

Communications - Scientific Letters of the University of Zilina 2004, 6(4):17-21 | DOI: 10.26552/com.C.2004.4.17-21

Speech recognition using hidden markov model with low redundancy in the observation space

Roman Jarina¹, Michal Kuba¹: ¹ Department of Telecommunications, Faculty of Electrical Engineering, University of Zilina, Slovakia

Current speech recognition systems usually model a speech signal as a finite-state stochastic process, in which acoustic observations are obtained through short-term spectral analysis. The model has to deal with several thousands of speech parameters during one second of utterance. A great redundancy in the parameters makes processing computationally very expensive. We propose a combination of 2-D cepstral analysis and continuous Hidden Markov Model with a small, optimally designed, number of states and acoustic observations. 2-D cepstrum efficiently preserves spectral variations of speech and yields uncorrelated parameters in both time and frequency. The system is evaluated on isolated word recognition task in Slovak language. Promising preliminary results are presented.

Keywords: no keywords

Published: December 31, 2004 Show citation

Jarina, R., & Kuba, M. (2004). Speech recognition using hidden markov model with low redundancy in the observation space. Communications - Scientific Letters of the University of Zilina, 6(4), 17-21. doi: 10.26552/com.C.2004.4.17-21

Share...

Download citation

Open full article

References

O'SHAUGNESSY, D.: Interacting with computer by voice: Automatic Speech Recognition and Synthesis, Proceedings of the IEEE, Vol. 91, No. 9 (2003) 1272-1305. Go to original source...
HERMANSKY, H.: Should recognizers have ears?, Speech Communications 25 (1998) 3-27. Go to original source...
FURUI, S.: On the role of spectral transition for speech perception, J.Acoust.Soc.Am. 80(4), (1986) 1016-1025. Go to original source...
HERMANSKY, H., MORGAN, N.: RASTA processing of speech, IEEE Trans. Speech Audio Process. 2(4), (1994) 578-589. Go to original source...
JARINA, R.: Kepstrálno-spektrálny model pre rozpoznávanie rečových signálov, dissertation, University of Žilina (1999).
KANEDERA, N, ARAI, T., HERMANSKY, H., MISHA, P.: On the importance of various modulation frequencies for speech recognition, Proc. Eurospeech'97, Rhodos, Greece (1997) 1079-1082. Go to original source...
JARINA, R.: Study of discriminative properties of two-dimensional cepstrum analysis for speech recognition, Proc. RADIOELEKTRONIKA' 99, Brno, Czech, (1999) 168-171.
ARIKI, Y., MIZUTA, S., NAGATA, M., SAKAI, T.: Spoken-word recognition using dynamic features analysed by two-dimensional cepstrum, Proc. IEE, Vol. 136, Pt.I, No.2, (1989) 133-140. Go to original source...
MILNER, B.P., VASEGHI, S.V.: Speech modelling using cepstral-time feature matrices and Hidden Markov Models, Proc. of the IEEE conf. ICASSP '94, Vol.I, Adelaide, Australia (1994) 601-604.
KANEDERA, N., HERMANSKY, H., ARAI, T.: Desired characteristics of modulation spectrum for robust automatic speech recognition, Proc. of the IEEE conf. ICASSP'98, Seatle, USA (1998).
JANČOVIČ, P, MACHO, D., NADEU, C., ROZINAJ, G.: Feature selection in cepstral-time matrices for clean and noisy speech recognition, Proc. TEMPUS-TELECOMNET workshop ITTW'98, Barcelona, Spain, July (1998) 28-36.

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.

Return to the content