Explore chapters and articles related to this topic
An Overview of the Concept of Speaker Recognition
Published in Chiranji Lal Chowdhary, Intelligent Systems, 2019
Sindhu Rajendran, Meghamadhuri Vakil, Praveen Kumar Gupta, Lingayya Hiremath, S. Narendra Kumar, Ajeet Kumar Srivastava
Usually Rasta method and PLPCC method are combined to get the low pass transfer function. In general, the rate of change of non-linguistic elements in speech is often present outside the rate of change of vocal tract.12 The element which is changing very gradually or very rapidly than rate of change of speech is suppressed by the Rasta method. Rasta filtering method gives better accuracy in the presence of noise.
Automatic Speech Recognition Using Limited Vocabulary: A Survey
Published in Applied Artificial Intelligence, 2022
Jean Louis K. E Fendji, Diane C. M. Tala, Blaise O. Yenke, Marcellin Atemkeng
Feature extraction is the first step for an ASR system. It converts the waveform speech signal to a set of feature vectors with the aim of having high discrimination between phonemes (Lokhande 2015). The feature extraction performs all the required measurements on the selected segment that will be used to make a decision (Doukas, Bardis, and Markovskyi 2017). The measured features may be used to update long-term statistical measures to facilitate the adaptation of the process to varying environmental conditions (mainly the background) (Doukas and Bardis 2017). Feature extraction will determine the voice areas in the recording to be written out and extract sequences of acoustic parameters from them. There are many techniques for feature extraction, as reported in (Narang and Gupta 2015), including: Linear Predictive coding (LPC) (O’Shaughnessy 1988): LPC is a technique for signal source modeling in speech signal processing.RelAtive SpecTral (RASTA) filtering (Hermansky and Morgan 1994): RASTA is designed to decrease the impact of noise as well as heighten speech. This technique is widely used for noisy speech.Linear Discriminant Analysis (LDA) and Probabilistic LDA (Ioffe 2006): This technique uses the state-dependent variables of Hidden Markov Model-based (HMM) on i–vector extraction. The i–vector is a low dimensional vector with a fixed length that contains relevant information.Mel-frequency cepstrum (MFCCs): It is the most commonly used technique, with a frameshift and length usually between 20 and 32 ms, using 1024 frequency bins, 26 mel channels and between 10 and 40 cepstral coefficients with cepstral mean normalization (Murphy 2012; Murshed et al., 2020; Padmanabhan & Premku- mar, 2015; Renals and Grefenstette 2000). This technique has low complexity and a high ACC of recognition. Mel-Frequency Cepstrum Coefficient (MFCC) is the usual method for character extraction in most papers tackling the design of speech recognition systems for limited vocabulary (Doukas and Bardis 2017; Gerazov and Ivanovski 2013; F. Huang 2011). The public sphinx base library provides an implementation of this method that can be used directly, as was done in (X. Liu and Zhou 2014). Figure 2 provides a brief description of the MFCC method, encompassing six steps as described below.