Keyword spotting

Keyword spotting

Keyword spotting refers to the process of detecting a specific set of pre-defined keywords in a continuous stream of audio, with the goal of identifying their presence and sometimes their time location. This technique is commonly used in speech recognition and natural language processing applications.From: Pattern Recognition in Speech and Language Processing [2019], Efficient Keyword Spotting System Using Deformable Convolutional Network [2021]

Applications of Artificial Neural Networks (ANNs) to Speech Processing

View Chapter

Purchase Book

Published in Yu Hen Hu, Jenq-Neng Hwang, Handbook of Neural Network Signal Processing, 2018

Shigeru Katagiri

Keyword spotting (for some particular words) is usually formalized by using keyword model λ and its corresponding threshold h. A spotting decision is fundamentally made at every time index over an input speech pattern X. Then, a discriminant function gt(X, St; λ) is defined as a function that measures the probability for observing a selected speech segment St of input utterance X, and the spotting decision rule is formulated as, “If the discriminant function meets () gt(X,St;λ)>h

Practical Techniques for Improving Speech Recognition Performance

View Chapter

Purchase Book

Published in John Holmes, Wendy Holmes, Speech Synthesis and Recognition, 2002

John Holmes, Wendy Holmes

A common approach to keyword spotting uses continuous speech recognition that incorporates additional models to represent the acoustic background (often called “filler” or “garbage” models). The structure of the background models can vary from a simple one-state HMM with a Gaussian mixture output distribution trained on a suitable range of speech material, to networks of phonetic models or even networks representing word sequences that are typical of the application in which the system is being used. The recognition process produces a continuous stream of keywords and fillers, from which the keywords can be extracted to provide the recognizer output.

Efficient Keyword Spotting System Using Deformable Convolutional Network

View Article

Journal Information

Published in IETE Journal of Research, 2021

Huu Binh Nguyen, Van Hai Duong, Anh Xuan Tran Thi, Quoc Cuong Nguyen

Keyword Spotting (KWS) aims to detect a pre-defined keyword or a set of keywords in a continuous stream of audio. In recent years, with the development of voice assistants, keyword spotting has become a common way to begin an interaction by the voice interface (e.g. “Ok Google”, “Alexa”, or “Hey Siri”). In practice, such systems listen continuously for a specific pre-defined wake word and run on embedded devices, such as smartphones or smart-home controllers, with limited memory and computational resources. Therefore, an effective on-device KWS requires real-time response, high detection accuracy at a low false alarm (FA) rate, while limiting footprint size and computational cost.

Explore chapters and articles related to this topic

Applications of Artificial Neural Networks (ANNs) to Speech Processing

Practical Techniques for Improving Speech Recognition Performance

Efficient Keyword Spotting System Using Deformable Convolutional Network