now publishers - A Dual-branch Convolutional Network Architecture Processing on both Frequency and Time Domain for Single-channel Speech Enhancement

APSIPA Transactions on Signal and Information Processing > Vol 12 > Issue 3

A Dual-branch Convolutional Network Architecture Processing on both Frequency and Time Domain for Single-channel Speech Enhancement

Kanghao Zhang, College of Computer Science, Inner Mongolia University, China, Shulin He, College of Computer Science, Inner Mongolia University, China, Hao Li, Department of Electrical and Electronic Engineering, Southern University of Science and Technology, and Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, China, Xueliang Zhang, College of Computer Science, Inner Mongolia University, China, cszxl@imu.edu.cn

Suggested Citation

Kanghao Zhang, Shulin He, Hao Li and Xueliang Zhang (2023), "A Dual-branch Convolutional Network Architecture Processing on both Frequency and Time Domain for Single-channel Speech Enhancement", APSIPA Transactions on Signal and Information Processing: Vol. 12: No. 3, e19. http://dx.doi.org/10.1561/116.00000083

Publication Date: 24 May 2023

Subjects

Keywords

Deep learning, speech enhancement, time-domain processing, frequency-domain processing, feature normalization

Journal details

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 1465 times

In this article:

Abstract

Single-channel speech enhancement aims to remove the interfering noise and reverberation in real environments by a single microphone, which is a very challenging task in the speech signal processing field. Over the past years, deep learning has shown great potential for speech enhancement. In this paper, we propose a novel real-time framework, called DBCN, which is a dual-branch architecture. One branch takes waveform as its input for time-domain modeling and the other one takes shift real spectrum as input for frequency-domain modeling. The two branches have the same network structure, which is the representative convolutional recurrent network. To exchange information sufficiently, a bridge module is added between the two branches. Furthermore, we propose a novel feature normalization approach that enables each band to complete the normalization independently by counting the root mean square of each band and obtaining the inter-frame relationship for each band. The proposed approach allows the network to ignore the magnitude during processing, reducing learning difficulty and improving performance. Systematical evaluation and comparison are conducted. Experimental results show that the proposed system substantially outperforms related algorithms for causal and non-causal speech enhancement under very challenging environments.

DOI:10.1561/116.00000083

Related publications

Companion

APSIPA Transactions on Signal and Information Processing Special Issue - Advanced Acoustic, Sound and Audio Processing Techniques and Their Applications
See the other articles that are part of this special issue.

Introduction
Problem Formulation
Dual-Branch Architecture
Experimental Settings
Results, Comparisons and Analyses
Conclusion
References

A Dual-branch Convolutional Network Architecture Processing on both Frequency and Time Domain for Single-channel Speech Enhancement

Share

Journal details

Abstract

Related publications