APSIPA Transactions on Signal and Information Processing > Vol 12 > Issue 5

Multi-Scale Self-Attention Network for Denoising Medical Images

Kyungsu Lee, DGIST, Korea, Haeyun Lee, Samsung SDI, Korea, Moon Hwan Lee, DGIST, Korea, Jin Ho Chang, DGIST, Korea, C.-C. Jay Kuo, University of Southern California, USA, Seung-June Oh, Seoul National University Hospital, Korea, Jonghye Woo, Massachusetts General Hospital and Harvard Medical School, USA, Jae Youn Hwang, DGIST, Korea, jyhwang@dgist.ac.kr
Suggested Citation
Kyungsu Lee, Haeyun Lee, Moon Hwan Lee, Jin Ho Chang, C.-C. Jay Kuo, Seung-June Oh, Jonghye Woo and Jae Youn Hwang (2024), "Multi-Scale Self-Attention Network for Denoising Medical Images", APSIPA Transactions on Signal and Information Processing: Vol. 12: No. 5, e204. http://dx.doi.org/10.1561/116.00000169

Publication Date: 22 Jan 2024
© 2024 K. Lee, H. Lee, M. H. Lee, J. H. Chang, C.-C. J. Kuo, S.-J. Oh, J. Woo and J. Y. Hwang
Blind source separationOnline-independent vector analysisCircular microphone arraySound field interpolation


Open Access

This is published under the terms of CC BY-NC.

Downloaded: 263 times

In this article:
Related Works 
Multi-scale Self-Attention Network 
Discussion and Conclusions 


Deep learning-based image denoising plays a critical role in medical imaging, especially when dealing with rapid fluorescence and ultrasound captures where traditional noise mitigation strategies are limited, such as increasing pixel dwell time or frame averaging. Although numerous denoising techniques based on deep learning have exhibited commendable results across biomedical domains, further optimization is pivotal, particularly for precise real-time tracking of molecular kinetics in cellular settings. This is vital for decoding the intricate dynamics of biological processes. In this context, we propose the Multi-Scale Self-Attention Network (MSAN), an innovative architecture tailored for optimal denoising of fluorescence and ultrasound images. MSAN integrates three main modules: a feature extraction layer adept at discerning high and low-frequency attributes, a multi-scale self-attention mechanism that predicts residuals using original and downsampled feature maps, and a decoder that produces a residual image. When offset from the original image, the residual output yields the denoised result. Benchmarking shows MSAN outperforms state-of-the-art models such as RIDNet and DnCNN, achieving peak signal-to-noise ratio improvements of 0.17 dB, 0.23 dB, and 1.77dB on the FMD, W2S datasets, and ultrasound dataset, respectively, thus showcasing its superior denoising capability for fluorescence and ultrasound imagery.



APSIPA Transactions on Signal and Information Processing Special Issue - AI for Healthcare
See the other articles that are part of this special issue.