APSIPA Transactions on Signal and Information Processing > Vol 12 > Issue 3

Automatic Analyses of Dysarthric Speech based on Distinctive Features

Ka Ho Wong, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, The People’s Republic of China, khwong@se.cuhk.edu.hk , Helen Mei-Ling Meng, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, The People’s Republic of China
 
Suggested Citation
Ka Ho Wong and Helen Mei-Ling Meng (2023), "Automatic Analyses of Dysarthric Speech based on Distinctive Features", APSIPA Transactions on Signal and Information Processing: Vol. 12: No. 3, e18. http://dx.doi.org/10.1561/116.00000077

Publication Date: 03 May 2023
© 2023 K. H. Wong and H. Meng
 
Subjects
 
Keywords
Dysarthricdistinctive featuresrecognitionsequence-to-sequence
 

Share

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 609 times

In this article:
Introduction 
Background 
Corpora 
Analysis of Dysarthric, Deviant Articulations based on Manual Phonetic Transcriptions 
Acquiring Deviant Articulations based on Automatic Transcriptions 
Comparing Articulatory Analyses based on Manual and Automatic Transcriptions 
Applications in Analysis of Dysarthric Speech 
Conclusions and Future Work 
References 

Abstract

Dysathria is a neuromotor disorder that causes the individual to speak with imprecise articulation. This paper presents an automatic analysis framework for dysarthric speech, using a linguistically motivated representation based on distinctive features. Our framework includes a seq2seq phonetic decoder for Cantonese dysarthric speech. The manually or automatically transcribed phones can be mapped into a representation that consists of 21 distinctive features (DF). The DFs between the transcribed phones and canonical phones are compared in order to identify articulatory error rate (AER) for each DF. This forms an AER profile for a given set of dysarthric recordings from a speaker. Experiments show that the difference between the AER profile derived from manual versus automatic phonetic transcription is relatively small – with a root mean squared error (RMSE) of 0.053 for the word-reading task and 0.085 for the sentence-reading task in CU DYS. In addition, the correlations between the AER profiles are high, at 0.97 and 0.95 for the two tasks respectively. These results reflect the viability of the proposed framework as an automated means of processing dysarthric speech to achieve articulatory analyses described by DFs. The AER profile is intuitive and interpretable, for pinpointing problem areas in articulation.

DOI:10.1561/116.00000077

Companion

APSIPA Transactions on Signal and Information Processing Special Issue - Advanced Acoustic, Sound and Audio Processing Techniques and Their Applications
See the other articles that are part of this special issue.