APSIPA Transactions on Signal and Information Processing > Vol 9 > Issue 1

An analysis of speaker dependent models in replay detection

Gajan Suthokumar, University of New South Wales, Australia, g.suthokumar@unsw.edu.au , Kaavya Sriskandaraja, University of New South Wales, Australia, Vidhyasaharan Sethu, University of New South Wales, Australia, Eliathamby Ambikairajah, University of New South Wales, Australia, Haizhou Li, National University of Singapore, Singapore
 
Suggested Citation
Gajan Suthokumar, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah and Haizhou Li (2020), "An analysis of speaker dependent models in replay detection", APSIPA Transactions on Signal and Information Processing: Vol. 9: No. 1, e14. http://dx.doi.org/10.1017/ATSIP.2020.9

Publication Date: 30 Apr 2020
© 2020 Gajan Suthokumar, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah and Haizhou Li
 
Subjects
 
Keywords
Speaker Dependent ModelsReplay AttackSpoofing DetectionSpeaker VerificationSpeaker Adapted Neural Networks
 

Share

Open Access

This is published under the terms of the Creative Commons Attribution licence.

Downloaded: 1313 times

In this article:
I. INTRODUCTION 
II. ANALYSIS OF SPEAKER VARIABILITY 
III. PROPOSED SPEAKER DEPENDENT SPOOFING DETECTION GMM BACKEND 
IV. PROPOSED SPEAKER DEPENDENT DEEP NEURAL NETWORK BACKEND 
V. DATABASES and DATA PREPARATION 
VI. FRONT-END FEATURES 
VII. EXPERIMENTAL SETTING 
VIII. RESULTS and DISCUSSION 
IX. CONCLUSION 

Abstract

Most research on replay detection has focused on developing a stand-alone countermeasure that runs independently of a speaker verification system by training a single spoofed model and a single genuine model for all speakers. In this paper, we explore the potential benefits of adapting the back-end of a spoofing detection system towards the claimed target speaker. Specifically, we characterize and quantify speaker variability by comparing speaker-dependent and speaker-independent (SI) models of feature distributions for both genuine and spoofed speech. Following this, we develop an approach for implementing speaker-dependent spoofing detection using a Gaussian mixture model (GMM) back-end, where both the genuine and spoofed models are adapted to the claimed speaker. Finally, we also develop and evaluate a speaker-specific neural network-based spoofing detection system in addition to the GMM based back-end. Evaluations of the proposed approaches on replay corpora BTAS2016 and ASVspoof2017 v2.0 reveal that the proposed speaker-dependent spoofing detection outperforms equivalent SI replay detection baselines on both datasets. Our experimental results show that the use of speaker-specific genuine models leads to a significant improvement (around 4% in terms of equal error rate (EER)) as previously shown and the addition of speaker-specific spoofed models adds a small improvement on top (less than 1% in terms of EER).

DOI:10.1017/ATSIP.2020.9