Explore chapters and articles related to this topic
Applications of Artificial Neural Networks (ANNs) to Speech Processing
Published in Yu Hen Hu, Jenq-Neng Hwang, Handbook of Neural Network Signal Processing, 2018
Keyword spotting (for some particular words) is usually formalized by using keyword model λ and its corresponding threshold h. A spotting decision is fundamentally made at every time index over an input speech pattern X. Then, a discriminant function gt(X, St; λ) is defined as a function that measures the probability for observing a selected speech segment St of input utterance X, and the spotting decision rule is formulated as, “If the discriminant function meets () gt(X,St;λ)>h
Practical Techniques for Improving Speech Recognition Performance
Published in John Holmes, Wendy Holmes, Speech Synthesis and Recognition, 2002
A common approach to keyword spotting uses continuous speech recognition that incorporates additional models to represent the acoustic background (often called “filler” or “garbage” models). The structure of the background models can vary from a simple one-state HMM with a Gaussian mixture output distribution trained on a suitable range of speech material, to networks of phonetic models or even networks representing word sequences that are typical of the application in which the system is being used. The recognition process produces a continuous stream of keywords and fillers, from which the keywords can be extracted to provide the recognizer output.
Efficient Keyword Spotting System Using Deformable Convolutional Network
Published in IETE Journal of Research, 2021
Huu Binh Nguyen, Van Hai Duong, Anh Xuan Tran Thi, Quoc Cuong Nguyen
Keyword Spotting (KWS) aims to detect a pre-defined keyword or a set of keywords in a continuous stream of audio. In recent years, with the development of voice assistants, keyword spotting has become a common way to begin an interaction by the voice interface (e.g. “Ok Google”, “Alexa”, or “Hey Siri”). In practice, such systems listen continuously for a specific pre-defined wake word and run on embedded devices, such as smartphones or smart-home controllers, with limited memory and computational resources. Therefore, an effective on-device KWS requires real-time response, high detection accuracy at a low false alarm (FA) rate, while limiting footprint size and computational cost.