APSIPA Transactions on Signal and Information Processing > Vol 14 > Issue 1

Stabilizing and Enhancing Remixing-based Unsupervised Sound Source Separation

Kohei Saijo, Waseda University, Japan, saijo@pcl.cs.waseda.ac.jp , Tetsuji Ogawa, Waseda University, Japan
 
Suggested Citation
Kohei Saijo and Tetsuji Ogawa (2025), "Stabilizing and Enhancing Remixing-based Unsupervised Sound Source Separation", APSIPA Transactions on Signal and Information Processing: Vol. 14: No. 1, e25. http://dx.doi.org/10.1561/116.20250044

Publication Date: 22 Sep 2025
© 2025 R. Kawano and M. Kawamura
 
Subjects
Audio signal processing,  Enhancement,  Source separation,  Deep learning
 

Share

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 85 times

In this article:
Introduction 
Prior Work on Single-channel Unsupervised Sound Separation 
Training with Self-Remixing and RemixIT from Scratch 
Appropriate Loss Function for Remixing-based Methods 
Experimental Setup 
Experimental Results 
Conclusion 
References 

Abstract

In this paper, we present methods to stabilize training and enhance the performance of Self-Remixing, an unsupervised source separation framework. Self-Remixing trains a model to reconstruct original mixtures by separating pseudo-mixtures, which are generated by first separating the observed mixtures and then remixing the resulting sources. Although this approach has shown promising results, it suffers from two notable limitations: i) reliance on pretrained models, and ii) suboptimal performance on certain metrics, particularly word error rate (WER). To address these issues, we propose techniques that i) stabilize the training process, enabling end-to-end training from scratch without pre-training, and ii) identify the causes of WER degradation, introducing a tailored loss function to mitigate them. Our results demonstrate that, with improved remixing strategies and a carefully designed loss function, Self-Remixing achieves competitive performance even when trained entirely from scratch.

DOI:10.1561/116.20250044