now publishers - Stabilizing and Enhancing Remixing-based Unsupervised Sound Source Separation

APSIPA Transactions on Signal and Information Processing > Vol 14 > Issue 1

Stabilizing and Enhancing Remixing-based Unsupervised Sound Source Separation

Kohei Saijo, Waseda University, Japan, saijo@pcl.cs.waseda.ac.jp , Tetsuji Ogawa, Waseda University, Japan

Suggested Citation

Kohei Saijo and Tetsuji Ogawa (2025), "Stabilizing and Enhancing Remixing-based Unsupervised Sound Source Separation", APSIPA Transactions on Signal and Information Processing: Vol. 14: No. 1, e25. http://dx.doi.org/10.1561/116.20250044

Publication Date: 22 Sep 2025

Subjects

Audio signal processing, Enhancement, Source separation, Deep learning

Journal details

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 142 times

In this article:

Abstract

In this paper, we present methods to stabilize training and enhance the performance of Self-Remixing, an unsupervised source separation framework. Self-Remixing trains a model to reconstruct original mixtures by separating pseudo-mixtures, which are generated by first separating the observed mixtures and then remixing the resulting sources. Although this approach has shown promising results, it suffers from two notable limitations: i) reliance on pretrained models, and ii) suboptimal performance on certain metrics, particularly word error rate (WER). To address these issues, we propose techniques that i) stabilize the training process, enabling end-to-end training from scratch without pre-training, and ii) identify the causes of WER degradation, introducing a tailored loss function to mitigate them. Our results demonstrate that, with improved remixing strategies and a carefully designed loss function, Self-Remixing achieves competitive performance even when trained entirely from scratch.

DOI:10.1561/116.20250044

Introduction
Prior Work on Single-channel Unsupervised Sound Separation
Training with Self-Remixing and RemixIT from Scratch
Appropriate Loss Function for Remixing-based Methods
Experimental Setup
Experimental Results
Conclusion
References

Stabilizing and Enhancing Remixing-based Unsupervised Sound Source Separation

Share

Journal details

Abstract