Speech enhancement is a core problem in audio signal processing with commercial applications in devices as diverse as mobile phones, conference call systems, smart assistants, and hearing aids. An essential component in the design of speech enhancement algorithms is acoustic source localization. Speaker localization is also directly applicable to many other audio related tasks, e.g., automated camera steering, teleconferencing systems, and robot audition.
From a signal processing perspective, speaker localization is the task of mapping multichannel speech signals to 3-D source coordinates. To obtain viable solutions for this mapping, an accurate description of the source wave propagation captured by the respective acoustic channel is required. In fact, the acoustic channels can be considered as the spatialfingerprints characterizing the positions of each of the sources in a reverberant enclosure. These fingerprints represent complex reflection patterns stemming from the surfaces and objects characterizing the enclosure. Hence, they are usually modelled by a very large number of coefficients, resulting in an intricate high-dimensional representation.
We claim that in static acoustic environments, despite the high dimensional representation, the difference between acoustic channels can be attributed mainly to changes in the source position. Thus, the true intrinsic dimensionality of the variations of the acoustic channels are significantly smaller than the number of variables commonly used to represent them; that is, the acoustic channels pertain to a low-dimensional manifold that can be inferred from data using nonlinear dimensionality reduction techniques. A comprehensive experimental study carried out in a real-life acoustic environment demonstrates the validity of the proposed manifold-based paradigm.
Motivated by this result, several high-performance localization and tracking methods were developed by harnessing novel mathematical tools for learning over manifolds, including diffusion maps, semi-supervised learning, optimization in reproducing kernel Hilbert spaces and Gaussian process inference. We present two localization algorithms that were designed for a single microphone array of two microphones. These algorithms were extended to several distributed arrays by merging the information of the different manifolds associated with each array. Tracking a moving source was also addressed by a data-driven propagation model relating movements on the abstract manifold to the actual source displacements. This data-driven propagation model was combined with a classical localization approach, in a hybrid algorithm that ties together the two worlds of classical and data-driven localization, while gaining the benefits of both. We show that the proposed algorithms outperform state-of-the-art localization methods, and obtain high accuracy in challenging noisy and reverberant environments.
Acoustic source localization is an essential component in many modern day audio applications. For example, smart speakers require localization capabilities in order to determine the speakers in the scene and their role. Based on the location information, they can enhance a speaker or carry out location specific tasks, such as switching the lights on and off, steering a camera, etc. Localization has often been based on creating physical models which become extremely intricate in real-world applications. Recently, researchers have started using learning techniques to address localization problems.
This monograph introduces the reader to the research and practical aspects behind the approach of learning the characteristics of the acoustic environment directly from the data rather than using a predefined physical model. Written by the experts in the field who have developed many of these techniques, it provides a comprehensive overview and insights into this burgeoning area of acoustic developments. The reader is introduced to the underlying mathematics before being introduced to the localization problem in depth. The core paradigm of using manifolds for diffusion mapping and distance is then described. Building on these concepts, the authors address both single and multiple manifold localization. Finally, manifold-based tracking is covered.
Data-Driven Multi-Microphone Speaker Localization on Manifolds is an illuminating introduction to designing and building acoustic systems where localization of multi-microphone and speakers forms an essential part of the system.