now publishers - Speech emotion recognition based on listener-dependent emotion perception models

APSIPA Transactions on Signal and Information Processing > Vol 10 > Issue 1

Speech emotion recognition based on listener-dependent emotion perception models

Atsushi Ando, NTT Corporation, Japan AND Nagoya University, Japan, atsushi.ando.hd@hco.ntt.co.jp , Takeshi Mori, NTT Corporation, Japan, Satoshi Kobashikawa, NTT Corporation, Japan, Tomoki Toda, Nagoya University, Japan

Suggested Citation

Atsushi Ando, Takeshi Mori, Satoshi Kobashikawa and Tomoki Toda (2021), "Speech emotion recognition based on listener-dependent emotion perception models", APSIPA Transactions on Signal and Information Processing: Vol. 10: No. 1, e6. http://dx.doi.org/10.1017/ATSIP.2021.7

Publication Date: 20 Apr 2021

Subjects

Keywords

Speech emotion recognition, perceived emotion, adaptation

Journal details

Open Access

This is published under the terms of the Creative Commons Attribution licence.

Downloaded: 3654 times

In this article:

Abstract

This paper presents a novel speech emotion recognition scheme that leverages the individuality of emotion perception. Most conventional methods simply poll multiple listeners and directly model the majority decision as the perceived emotion. However, emotion perception varies with the listener, which forces the conventional methods with their single models to create complex mixtures of emotion perception criteria. In order to mitigate this problem, we propose a majority-voted emotion recognition framework that constructs listener-dependent (LD) emotion recognition models. The LD model can estimate not only listener-wise perceived emotion, but also majority decision by averaging the outputs of the multiple LD models. Three LD models, fine-tuning, auxiliary input, and sub-layer weighting, are introduced, all of which are inspired by successful domain-adaptation frameworks in various speech processing tasks. Experiments on two emotional speech datasets demonstrate that the proposed approach outperforms the conventional emotion recognition frameworks in not only majority-voted but also listener-wise perceived emotion recognition.

DOI:10.1017/ATSIP.2021.7

I. INTRODUCTION
II. RELATED WORK
III. EMOTION RECOGNITION BY MAJORITY-VOTED MODEL
IV. EMOTION RECOGNITION BY LD MODELS
V. EXPERIMENTS
VI. Conclusion

Speech emotion recognition based on listener-dependent emotion perception models

Share

Journal details

Abstract