HTK – Knowledge and References

Explore chapters and articles related to this topic

Quadratic Time–Frequency Features for Speech Recognition

Published in Antonia Papandreou-Suppappola, Applications in Time-Frequency Signal Processing, 2018

A conventional speaker-independent speech recognition system was constructed with the HTK modeling tool kit. This system provided a baseline for all improvements and modifications. This baseline system used standard MFCC. The coefficients were generated with a window length of 25 msec and a window skip of 5 msec. The static MFCC and energy terms were supplemented with standard δ and δ–δ coefficients, for a total of 36 coefficients. The acoustic models consisted of the 46 context-independent phones.

Robust automatic speech recognition based on neural network in reverberant environments

View Chapter

Purchase Book

Published in Jimmy C.M. Kao, Wen-Pei Sung, Civil, Architecture and Environmental Engineering, 2017

L. Bai, H.L. Li, Y.Y. He

Under the framework of HTK based recognizer, we retrain the acoustic model of “multi-condition” using HMMs structure and DNN respectively. The proper starting point is that the artificially distorted training signals are mismatch with the enhanced ones. Then the five possible cases are:

Automatic Speech Recognition Using Limited Vocabulary: A Survey

View Article

Journal Information

Published in Applied Artificial Intelligence, 2022

Jean Louis K. E Fendji, Diane C. M. Tala, Blaise O. Yenke, Marcellin Atemkeng

A plethora of open-source frameworks, engines, or toolkits for ASR systems are proposed in the literature. The following is a non-exhaustive list of main works or projects. Most of them can easily handle small and large vocabulary. The HTK is implemented in the late 1980s, and maintained by the Speech Vision and Robotics Group of the Cambridge University Engineering Department (CUED) (Young 2002); HTK is available to the research community since early 2000. It provides recipes to build baseline systems with HMM. HTK is considered a very simple and effective tool for research (Qiao, Sherwani, and Rosenfeld 2010; Supriya and Handore 2017). It can build a noise-robust ASR system in a moderated noisy level environment, especially for small vocabulary systems. It is a practical solution to develop fast and accurate Small Vocabulary Automatic Speech Recognition (SVASR) (Hatala 2019). One of the most popular toolkits is the CMU Sphinx, designed for both mobile and server applications. CMU Sphinx is in fact a set of libraries and tools that can be used to develop speech-enabled applications. It is developed at Carnegie Mellon University in the late 1980s (Lee 1988). Several versions have been released including Sphinx 1 to 4, and PocketSphinx for hand-held devices (Huggins-Daines et al. 2006). CMU Sphinx is currently attracting the attention of the research community. It offers the possibility to build new LMs using its language Modeling Tool.