now publishers - Speech Emotion Recognition Using Sequences of Fine-grained Emotion Labels with Phoneme Class Attributes

APSIPA Transactions on Signal and Information Processing > Vol 14 > Issue 1

Speech Emotion Recognition Using Sequences of Fine-grained Emotion Labels with Phoneme Class Attributes

Ryotaro Nagase, Ritsumeikan University, Japan, rnagase@fc.ritsumei.ac.jp , Takahiro Fukumori, Ritsumeikan University, Japan, Yoichi Yamashita, Ritsumeikan University, Japan

Suggested Citation

Ryotaro Nagase, Takahiro Fukumori and Yoichi Yamashita (2025), "Speech Emotion Recognition Using Sequences of Fine-grained Emotion Labels with Phoneme Class Attributes", APSIPA Transactions on Signal and Information Processing: Vol. 14: No. 1, e17. http://dx.doi.org/10.1561/116.20240077

Publication Date: 17 Jul 2025

Subjects

Deep learning, Classification and prediction, Speech and spoken language processing

Keywords

Speech emotion recognition, deep learning, emotion label sequence, phoneme class attribute

Journal details

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 101 times

In this article:

Abstract

Recently, much research has been actively conducted on speech emotion recognition (SER) using deep learning, which predicts emotions conveyed by speech. Our study focused on a method of recognizing emotions at each frame level. One challenge with this approach is that emotion label sequences, which are used for training the frame-based SER, do not sufficiently account for phonemic characteristics. To overcome this limitation, we propose a new frame-based SER methods using fine-grained emotion label sequences that considers phoneme class attributes, such as vowels, voiced consonants, unvoiced consonants, and other symbols. As a result, we found that the proposed methods improve the utteranceand frame-level performance compared with conventional methods.

DOI:10.1561/116.20240077

Introduction
SER Using Emotion Label Sequences
SER Using Emotion Label Sequences with Phoneme Class Attributes
Experimental Setup
Results
Conclusion
Biographies
References

Speech Emotion Recognition Using Sequences of Fine-grained Emotion Labels with Phoneme Class Attributes

Share

Journal details

Abstract