APSIPA Transactions on Signal and Information Processing > Vol 5 > Issue 1

Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering

Naohiro Tawara, Waseda University, Japan, tawara@pcl.cs.waseda.ac.jp , Tetsuji Ogawa, Waseda University, Japan, Shinji Watanabe, Mitsubishi Electric Research Laboratories, USA, Tetsunori Kobayashi, Waseda University, Japan
 
Suggested Citation
Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe and Tetsunori Kobayashi (2016), "Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering", APSIPA Transactions on Signal and Information Processing: Vol. 5: No. 1, e16. http://dx.doi.org/10.1017/ATSIP.2016.15

Publication Date: 31 Aug 2016
© 2016 Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe and Tetsunori Kobayashi
 
Subjects
 
Keywords
Fully Bayesian approachMarkov chain Monte CarloNested Gibbs samplingMixture-of-mixture modelSpeaker clustering
 

Share

Open Access

This is published under the terms of the Creative Commons Attribution licence.

Downloaded: 919 times

In this article:
I. INTRODUCTION 
II. FORMULATION 
III. MODEL INFERENCE BASED ON FULLY BAYESIAN APPROACH 
IV. IMPLEMENTATION OF MCMC-BASED MODEL ESTIMATION 
V. SPEAKER CLUSTERING EXPERIMENTS 
VI. CONCLUSION AND FUTURE WORK 

Abstract

This paper proposes a novel model estimation method, which uses nested Gibbs sampling to develop a mixture-of-mixture model to represent the distribution of the model's components with a mixture model. This model is suitable for analyzing multilevel data comprising frame-wise observations, such as videos and acoustic signals, which are composed of frame-wise observations. Deterministic procedures, such as the expectation–maximization algorithm have been employed to estimate these kinds of models, but this approach often suffers from a large bias when the amount of data is limited. To avoid this problem, we introduce a Markov chain Monte Carlo-based model estimation method. In particular, we aim to identify a suitable sampling method for the mixture-of-mixture models. Gibbs sampling is a possible approach, but this can easily lead to the local optimum problem when each component is represented by a multi-modal distribution. Thus, we propose a novel Gibbs sampling method, called “nested Gibbs sampling,” which represents the lower-level (fine) data structure based on elemental mixture distributions and the higher-level (coarse) data structure based on mixture-of-mixture distributions. We applied this method to a speaker clustering problem and conducted experiments under various conditions. The results demonstrated that the proposed method outperformed conventional sampling-based, variational Bayesian, and hierarchical agglomerative methods.

DOI:10.1017/ATSIP.2016.15