Foundations and Trends® in Signal Processing > Vol 1 > Issue 3

The Application of Hidden Markov Models in Speech Recognition

By Mark Gales, Cambridge University Engineering Department, UK, mjfg@eng.cam.ac.uk | Steve Young, Cambridge University Engineering Department, UK, sjy@eng.cam.ac.uk

 
Suggested Citation
Mark Gales and Steve Young (2008), "The Application of Hidden Markov Models in Speech Recognition", Foundations and TrendsĀ® in Signal Processing: Vol. 1: No. 3, pp 195-304. http://dx.doi.org/10.1561/2000000004

Publication Date: 21 Feb 2008
© 2008 M. Gales and S. Young
 
Subjects
Speech/audio/image/video compression
 

Free Preview:

Download extract

Share

Download article
In this article:
1 Introduction 
2 Architecture of an HMM-Based Recogniser 
3 HMM Structure Refinements 
4 Parameter Estimation 
5 Adaptation and Normalisation 
6 Noise Robustness 
7 Multi-Pass Recognition Architectures 
Conclusions 
Acknowledgments 
Notations and Acronyms 
References 

Abstract

Hidden Markov Models (HMMs) provide a simple and effective framework for modelling time-varying spectral vector sequences. As a consequence, almost all present day large vocabulary continuous speech recognition (LVCSR) systems are based on HMMs.

Whereas the basic principles underlying HMM-based LVCSR are rather straightforward, the approximations and simplifying assumptions involved in a direct implementation of these principles would result in a system which has poor accuracy and unacceptable sensitivity to changes in operating environment. Thus, the practical application of HMMs in modern systems involves considerable sophistication.

The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then describe the various refinements which are needed to achieve state-of-the-art performance. These refinements include feature projection, improved covariance modelling, discriminative parameter estimation, adaptation and normalisation, noise compensation and multi-pass system combination. The review concludes with a case study of LVCSR for Broadcast News and Conversation transcription in order to illustrate the techniques described.

DOI:10.1561/2000000004
ISBN: 978-1-60198-120-2
112 pp. $80.00
Buy book (pb)
 
ISBN: 978-1-60198-121-9
112 pp. $100.00
Buy E-book (.pdf)
Table of contents:
1: Introduction
2: Architecture of a HMM-Based Recogniser
3: HMM Structure Refinements
4: Parameter Estimation
5: Adaptation and Normalisation
6: Noise Robustness
7: Multi-Pass Recognition Architectures
Conclusions
Acknowledgements
Notations and Acronyms
References

The Application of Hidden Markov Models in Speech Recognition

Hidden Markov Models (HMMs) provide a simple and effective framework for modelling time-varying spectral vector sequences. As a consequence, almost all present day large vocabulary continuous speech recognition (LVCSR) systems are based on HMMs. Whereas the basic principles underlying HMM-based LVCSR are rather straightforward, the approximations and simplifying assumptions involved in a direct implementation of these principles would result in a system which has poor accuracy and unacceptable sensitivity to changes in operating environment. Thus, the practical application of HMMs in modern systems involves considerable sophistication. The Application of Hidden Markov Models in Speech Recognition presents the core architecture of a HMM-based LVCSR system and proceeds to describe the various refinements which are needed to achieve state-of-the-art performance. These refinements include feature projection, improved covariance modelling, discriminative parameter estimation, adaptation and normalisation, noise compensation and multi-pass system combination. It concludes with a case study of LVCSR for Broadcast News and Conversation transcription in order to illustrate the techniques described. The Application of Hidden Markov Models in Speech Recognition is an invaluable resource for anybody with an interest in speech recognition technology.

 
SIG-004