now publishers - A technical framework for automatic perceptual evaluation of singing quality

APSIPA Transactions on Signal and Information Processing > Vol 7 > Issue 1

A technical framework for automatic perceptual evaluation of singing quality

Chitralekha Gupta, NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore AND Computer Science Department, National University of Singapore, Singapore, chitralekha@u.nus.edu , Haizhou Li, Electrical and Computer Engineering Department, National University of Singapore, Singapore, Ye Wang, NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore AND Computer Science Department, National University of Singapore, Singapore

Suggested Citation

Chitralekha Gupta, Haizhou Li and Ye Wang (2018), "A technical framework for automatic perceptual evaluation of singing quality", APSIPA Transactions on Signal and Information Processing: Vol. 7: No. 1, e10. http://dx.doi.org/10.1017/ATSIP.2018.10

Publication Date: 14 Sep 2018

Subjects

Keywords

Singing Vocal, Perceptual Evaluation of Singing Quality, Automatic Evaluation, Human Perception

Journal details

Open Access

This is published under the terms of the Creative Commons Attribution licence.

Downloaded: 2219 times

In this article:

Abstract

Human experts evaluate singing quality based on many perceptual parameters such as intonation, rhythm, and vibrato, with reference to music theory. We proposed previously the Perceptual Evaluation of Singing Quality (PESnQ) framework that incorporated acoustic features related to these perceptual parameters in combination with the cognitive modeling concept of the telecommunication standard Perceptual Evaluation of Speech Quality to evaluate singing quality. In this study, we present further the study of the PESnQ framework to approximate the human judgments. First, we find that a linear combination of the individual perceptual parameter human scores can predict their overall singing quality judgment. This provides us with a human parametric judgment equation. Next, the prediction of the individual perceptual parameter scores from the PESnQ acoustic features show a high correlation with the respective human scores, which means more meaningful feedback to learners. Finally, we compare the performance of early fusion and late fusion of the acoustic features in predicting the overall human scores. We find that the late fusion method is superior to that of the early fusion method. This work underlines the importance of modeling human perception in automatic singing quality assessment.

DOI:10.1017/ATSIP.2018.10

I. INTRODUCTION
II. FRAMEWORK OF EVALUATION
III. CHARACTERIZATION OF SINGING QUALITY
IV. EXPERIMENTS
V. CONCLUSIONS

A technical framework for automatic perceptual evaluation of singing quality

Share

Journal details

Abstract