Voice-based biometric systems are vulnerable to spoofing attacks, where attackers can deceive the systems with synthetic or replayed voice samples. To address this vulnerability, we introduce the InaSpoof-v1 dataset, which is a comprehensive benchmark for Indonesian language spoofing detection. We evaluate the state-of-the-art countermeasure models on this dataset, highlighting the challenges posed by the diversity of the Indonesian language and the impacts of demographic factors. Our experimental results demonstrate the effectiveness of the end-to-end AASIST model for synthesized speech attack countermeasures and residual networks (ResNet) for replay attack detection. To improve future systems, we emphasize the importance of considering demographic factors and addressing the challenges posed by real-world scenarios.
Companion
APSIPA Transactions on Signal and Information Processing Special Issue - Deepfakes, Unrestricted Adversaries, and Synthetic Realities in the Generative AI Era
See the other articles that are part of this special issue.