APSIPA Transactions on Signal and Information Processing > Vol 8 > Issue 1

Combining acoustic signals and medical records to improve pathological voice classification

Shih-Hau Fang, Yuan Ze University, and MOST Joint Research Center for AI Technology and All Vista Healthcare Innovation Center, Taiwan, Chi-Te Wang, Yuan Ze University, and MOST Joint Research Center for AI Technology and All Vista Healthcare Innovation Center, Taiwan, Ji-Ying Chen, Yuan Ze University, and MOST Joint Research Center for AI Technology and All Vista Healthcare Innovation Center, Taiwan, Yu Tsao, Research Center for Information Technology Innovation, Taiwan, yu.tsao@citi.sinica.edu.tw , Feng-Chuan Lin, Far Eastern Memorial Hospital, Taiwan
 
Suggested Citation
Shih-Hau Fang, Chi-Te Wang, Ji-Ying Chen, Yu Tsao and Feng-Chuan Lin (2019), "Combining acoustic signals and medical records to improve pathological voice classification", APSIPA Transactions on Signal and Information Processing: Vol. 8: No. 1, e14. http://dx.doi.org/10.1017/ATSIP.2019.7

Publication Date: 11 Jun 2019
© 2019 Shih-Hau Fang, Chi-Te Wang, Ji-Ying Chen, Yu Tsao and Feng-Chuan Lin
 
Subjects
 
Keywords
Pathological voiceDiseases classificationAcoustic signalMedical recordArtificial intelligence
 

Share

Open Access

This is published under the terms of the Creative Commons Attribution licence.

Downloaded: 1701 times

In this article:
I. INTRODUCTION 
II. PATHOLOGICAL VOICE CLASSIFICATION FRAMEWORKS 
III. EXPERIMENTS AND RESULTS 
IV. CONCLUSION 

Abstract

This study proposes two multimodal frameworks to classify pathological voice samples by combining acoustic signals and medical records. In the first framework, acoustic signals are transformed into static supervectors via Gaussian mixture models; then, a deep neural network (DNN) combines the supervectors with the medical record and classifies the voice signals. In the second framework, both acoustic features and medical data are processed through first-stage DNNs individually; then, a second-stage DNN combines the outputs of the first-stage DNNs and performs classification. Voice samples were recorded in a specific voice clinic of a tertiary teaching hospital, including three common categories of vocal diseases, i.e. glottic neoplasm, phonotraumatic lesions, and vocal paralysis. Experimental results demonstrated that the proposed framework yields significant accuracy and unweighted average recall (UAR) improvements of 2.02–10.32% and 2.48–17.31%, respectively, compared with systems that use only acoustic signals or medical records. The proposed algorithm also provides higher accuracy and UAR than traditional feature-based and model-based combination methods.

DOI:10.1017/ATSIP.2019.7