now publishers - An Investigation of Noisy-to-noisy Voice Conversion Performance in Various Noisy Conditions

APSIPA Transactions on Signal and Information Processing > Vol 14 > Issue 1

An Investigation of Noisy-to-noisy Voice Conversion Performance in Various Noisy Conditions

Chao Xie, Nagoya University, Japan, xie.chao@g.sp.m.is.nagoya-u.ac.jp , Tomoki Toda, Nagoya University, Japan

Suggested Citation

Chao Xie and Tomoki Toda (2025), "An Investigation of Noisy-to-noisy Voice Conversion Performance in Various Noisy Conditions", APSIPA Transactions on Signal and Information Processing: Vol. 14: No. 1, e10. http://dx.doi.org/10.1561/116.20250008

Publication Date: 10 Jun 2025

Subjects

Speech and spoken language processing, Denoising, Deep learning

Keywords

Voice conversion (VC), noisy-to-noisy VC, noisy speech modeling, mutual information, noise dropout

Journal details

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 358 times

In this article:

Abstract

Voice conversion (VC) in a noisy-to-noisy (N2N) scenario aims to convert the speaker identity of noisy speech to a target speaker while preserving both the linguistic content and background noise. In our previous work, we proposed an N2N framework for this conversion. Notably, our VC approach relies solely on noisy speech data for training without requiring clean speech data from either the source or target speakers. Additionally, the framework enables the retention or removal of the noise component in the converted speech during conversion. However, significant performance degradation was observed in the N2N framework when certain noisy conditions were present in the training data. In this paper, we further investigate adverse noisy conditions affecting our framework’s performance. We identify two key factors contributing to performance degradation: the lack of noise diversity leading to feature entanglement and noise bias during training. To address these issues, we introduce a mutual information approximation and a noise dropout strategy into the N2N framework. Objective and subjective evaluations validate the effectiveness of our approach in improving converted speech quality and mitigating VC performance degradation under adverse noisy conditions.

DOI:10.1561/116.20250008

Introduction
Related Work
Analysis of N2N-VC Performance Degradation
Proposed Method
Experimental Setup
Experimental Results
Conclusion
Appendix: Supplementary Evaluation Results
References

An Investigation of Noisy-to-noisy Voice Conversion Performance in Various Noisy Conditions

Share

Journal details

Abstract