Communications - Scientific Letters of the University of Zilina 2011, 13(4):25-31 | DOI: 10.26552/com.C.2011.4.25-31
Quality of Synthesized Speech: Impact of the Newest Coding Approaches
- 1 Department of Telecommunications and Multimedia, Faculty of Electrical Engineering, University of Zilina, Slovakia
This contribution deals with the issue of quality of synthesized speech. It introduces principles and approaches of creating this type of speech and basic methods and techniques used to assess the quality of synthesized speech. This article also offers a short overview of relevant experimental studies discussing issues related to this kind of speech and its quality assessment. Finally, it investigates effect of the newest coding approaches (e.g. Speex, iLBC, EVRC-B, etc.) on quality of naturally-produced speech and synthesized speech (generated by diphone and unit-selection synthesizers) predicted by two different objective models and provided by subjective tests.
Keywords: synthesized speech, synthesizer, text-to-speech systems, quality assessment, coding approaches, degradation
Published: December 31, 2011 Show citation
References
- PSUTKA, J., MUELLER, L., MATOUSEK, J., RADOVA, V.: We Speak with Computer in Czech (in Czech). Academia, Praha, ISBN 80-200-1309-1, 2006, p. 752.
- ITU-T Recommendation P.85: A Method for Subjective Performance Assessment of the Quality of Speech Voice Output Devices, Intern. Telecommunications Union Publication, 1994.
- VISWANATHAN M., VISWANATHAN M.: Measuring Speech Quality for Text-to-speech Systems: Development and Assessment of Modified Mean Opinion Score (MOS) Scale. Computer Speech and Language 19, 2005, p. 55-83.
Go to original source...
- ITU-T Rec. P.800: Methods for Subjective Determination of Transmission Quality, Intern. Telecommunication Union, Geneva (Switzerland), 1996.
- SITYAEV, D., KNILL, K., BURROWS, T.: Comparison of the ITU-T P.85 Standard to Other Methods for the Evaluation of Text-to-Speech systems. INTERSPEECH 2006-ICSLP, Pittsburgh, Pennsylvania, 17-21 September, 2006. Vazquez Alvarez, Y., Huckvale, M. The Reliability of the ITU-T P.85 Standard for the Evaluation of Text-to-Speech Systems. In Proc. of ICSLP, 2002.
Go to original source...
- VAZQUEZ, A., Y., HUCKVALE, M.: The Reliability of the ITU-T P.85 Standard for the Evaluation of Text-to-Speech Systems. In Proc. of ICSLP, 2002.
- MULLENNIX, J. W., STERN, S. E., WILSON, S. J., DYSON, C.: Social Perception of Male and Female Computer Synthesized Speech. Computers in Human Behavior, Vol.19, 2003, p. 407-424.
Go to original source...
- FALK, T. H., MOELLER, S.: Towards Signal-Based Instrumental Quality Diagnosis for Text-to-Speech systems. IEEE Signal Processing Letters, Vol. 15, 2008, p. 781-784.
Go to original source...
- MOELLER, S., HINTERLEITNER, F., FALK, T. H., POLZEHL, T.: Comparison of Approaches for Instrumentally Prediction the Quality of Text-to-Speech Systems. Proc. International Conference on Spoken Language Processing (Interspeech 2010 - ICSLP), 2010.
Go to original source...
- HINTERLEITNER, F., MOELLER, S., FALK, T. H., POLZEHL, T.: Comparison of Approaches for Instrumentally Prediction the Quality of Text-to-Speech Systems: Data from Blizzard Challenges 2008 and 2009. Proceedings of the Blizzard Challenge Workshop. International Speech Communication Association (ISCA), 2010, p. 1-7.
- ITU-T Rec. P.862: Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, International Telecommunication Union, Geneva (Switzerland), 2001.
- RIX, A. W., HOLLIER, M. P., HEKSTRA, A. P., BEERENDS, J. G.: Perceptual evaluation of speech quality (PESQ) - The new ITU standard for objective measurement of perceived speech quality, Part I - Time-delay compensation, In J. Audio Eng. Soc., vol. 50, 2002, ISSN 1549-4950, p. 755-764.
- BEERENDS, J. G., HEKSTRA, A. P., RIX, A. W., HOLLIER, M. P.: Perceptual evaluation of speech quality (PESQ) - The new ITU standard for objective measurement of perceived speech quality, Part II - Psychoacoustic model, In J. Audio Eng. Soc., vol. 50, 2002, ISSN 1549-4950, p. 765-778.
- ITU-T Rec. P.563: Single-ended method for objective speech quality assessment in narrow-band telephony applications, International Telecommunication Union, Geneva (Switzerland), 2004.
- MALFAIT, L., BERGER, J., KASTNER, M. P.563 - The ITU-T Standard for Single-ended Speech Quality Assessment, In IEEE Transaction on Audio, Speech and Language Processing, vol. 14. No. 6, 2006, ISSN 1558-7916, p. 1924-1934.
Go to original source...
- KIM, D.-S. ANIQUE: An Auditory Model for Single-ended Speech Quality Estimation, In IEEE Transaction on Speech and Audio Processing, vol. 13, No.5, 2005, ISSN 1063-6676, p. 821-831.
Go to original source...
- KIM, D.-S., TARRAF, A. ANIQUE+: A new American National Standard for Non-intrusive Estimation of Narrowband Speech Quality, In Bell Labs Technical Journal, vol. 12, 2007, ISSN 1089-7089, p. 221-236.
Go to original source...
- CERNAK, M., RUSKO, M.: An Evaluation of Synthesized Speech Using the PESQ Measure. Proc. Forum Acusticum, Budapest, 2005, p. 2725-2728.
- ITU-T Contribution COM 12 - D 174 - E. Estimating the Quality of Transmitted Synthesized Speech with the Single-Ended Quality Prediction Model According to ITU-T Rec. P.563. Federal Republic of Germany (Authors: S. Moeller), ITU-T SG12 Meeting, 5-13 June, Geneva, 2006.
- ITU-T Contribution COM 12 - C 180 - E. Single-Ended Quality Estimation of Synthesized Speech: Analysis of the Rec. P.563 Internal Signal Processing. Federal Republic of Germany (Authors: S. Moeller, T.H. Falk), ITU-T SG12 Meeting, 22-29 May, Geneva, 2008.
- FALK, T. H., MOELLER, S., KARAISKOS, V., KING, S.: Improving Instrumental Quality Prediction Performance for the Blizzard Challenge. In: Proc. Blizzard Challenge Workshop, Brisbane, 2008, 6 pages.
- MOELLER, S., FALK, T., H. Quality Prediction for Synthesized Speech: Comparison of Approaches. NAG/DAGA 2009, Rotterdam, p. 1168-1171.
- MOELLER, S. Telephone Transmission Impact on Synthesized Speech: Quality Assessment and Prediction. Acta Acustica united with Acustica, Vol. 90, 2004, p. 121-136.
- MOELLER, S. Quality of Telephone-based Spoken Dialogue Systems, Springer, New York (USA), Chapter 5, ISBN 0-387-23190-0, 2005, p. 201-236.
- MOELLER, S., KIM, D.-S., MALFAIT, L. Estimating the Quality of Synthesized and Natural Speech Transmitted Through Telephone Networks Using Single-Ended Prediction Models. Acta Acustica united with Acustica, Vol. 94, 2008, p. 21-31.
Go to original source...
- DARJAA, S., RUSKO, M., TRNKA, M. Three Generations of Speech Synthesis Systems in Slovakia, In Proc. of XI Intern. Conference Speech and Computer (SPECOM 2006), Sankt Peterburg, 2006, ISBN 5-7452-0074-X, p. 297-302.
- ITU-T Rec. G.729: Coding of Speech at 8 kbit/s Using Conjugate-Structure Algebraic-Code-Exited Linear Prediction (CS-ACELP), Intern. Telecommunication Union, Geneva (Switzerland), 2007.
- ITU-T Rec. G.711: Pulse Code Modulation (PCM) of Voice Frequencies, Intern. Telecommunication Union, Geneva (Switzerland), 1988.
- ETS 300 580-2: Digital Cellular Telecommunications System (Phase 2); Full rate speech; Part 2: Transcoding (GSM 06.10 version 4.2.1), European Telecommunications Standards Institute, 2000.
- IETF RFC 3951: Internet Low Bit Rate Codec (iLBC), Internet Engineering Task Force, 2004.
- VALIN, J.-M. Speex: A Free Codec for Free Speech, In Proc. of Australian National Linux Conference (LCA 2006), Dunedin : New Zealand, 2006.
- 3GPP2 C.S0014-C: Enhanced Variable Rate Codec, Speech Service Options 3, 68, and 70 for Wideband Spread Spectrum Digital Systems, Third Generation Partnership Project 2, 2007.
- ITU-T Rec. P.862.1: Mapping Function for Transforming P.862 Raw Result Scores to MOS-LQO, Intern. Telecommunication Union, Geneva (Switzerland), 2003.
This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.