now publishers - Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-supervised Training of Sound Events With Partial Labels

APSIPA Transactions on Signal and Information Processing > Vol 14 > Issue 1

Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-supervised Training of Sound Events With Partial Labels

Keisuke Imoto, Kyoto University, Japan, keisuke.imoto@ieee.org

Suggested Citation

Keisuke Imoto (2025), "Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-supervised Training of Sound Events With Partial Labels", APSIPA Transactions on Signal and Information Processing: Vol. 14: No. 1, e31. http://dx.doi.org/10.1561/116.20250080

Publication Date: 06 Nov 2025

Subjects

Audio signal processing

Keywords

Acoustic scene classification, partial label, sound event detection

Journal details

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 2 times

In this article:

Abstract

Annotating time boundaries of sound events is labor-intensive, limiting the scalability of strongly supervised learning in audio detection. To reduce annotation costs, weakly-supervised learning with only clip-level labels has been widely adopted. As an alternative, partial label learning offers a cost-effective approach, where a set of possible labels is provided instead of exact weak annotations. However, partial label learning for audio analysis remains largely unexplored. Motivated by the observation that acoustic scenes provide contextual information for constructing a set of possible sound events, we utilize acoustic scene information to construct partial labels of sound events. On the basis of this idea, in this paper, we propose a multitask learning framework that jointly performs acoustic scene classification and sound event detection with partial labels of sound events. While reducing annotation costs, weakly-supervised and partial label learning often suffer from decreased detection performance due to lacking the precise event set and their temporal annotations. To better balance between annotation cost and detection performance, we also explore a semi-supervised framework that leverages both strong and partial labels. Moreover, to refine partial labels and achieve better model training, we propose a label refinement method based on self-distillation for the proposed approach with partial labels.

DOI:10.1561/116.20250080

Introduction
Conventional Methods
Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-supervised Approach With Partial Labels of Sound Events
Evaluation Experiments
Conclusions
References

Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-supervised Training of Sound Events With Partial Labels

Share

Journal details

Abstract