now publishers - Recent Advances in End-to-End Automatic Speech Recognition

APSIPA Transactions on Signal and Information Processing > Vol 11 > Issue 1

Recent Advances in End-to-End Automatic Speech Recognition

Industrial Technology Advances

Jinyu Li, Microsoft, USA, jinyli@microsoft.com

Suggested Citation

Jinyu Li (2022), "Recent Advances in End-to-End Automatic Speech Recognition", APSIPA Transactions on Signal and Information Processing: Vol. 11: No. 1, e8. http://dx.doi.org/10.1561/116.00000050

Publication Date: 20 Apr 2022

Subjects

Keywords

End-to-end, automatic speech recognition, streaming, attention, transducer, transformer, adaptation

Journal details

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 24558 times

In this article:

Abstract

Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models are still used in a large proportion of commercial ASR systems at the current time. There are lots of practical factors that affect the production model deployment decision. Traditional hybrid models, being optimized for production for decades, are usually good at these factors. Without providing excellent solutions to all these factors, it is hard for E2E models to be widely commercialized. In this paper, we will overview the recent advances in E2E models, focusing on technologies addressing those challenges from the industry’s perspective.

DOI:10.1561/116.00000050

Supplementary information

Replication Data | 116.00000050_supp.zip (ZIP).

This file contains the data that is required to replicate the data on your own system.

DOI: 10.1561/116.00000050_supp

Introduction
End-to-End Models
Encoder
Other Training Criterion
Multilingual Modeling
Adaptation
Advanced Models
Miscellaneous Topics
Conclusions and Future Directions
References

Recent Advances in End-to-End Automatic Speech Recognition

Share

Journal details

Abstract

Supplementary information