Explore chapters and articles related to this topic
Neural Fuzzy Based Intelligent Systems and Applications
Published in Lakhmi C. Jain, N.M. Martin, Fusion of Neural Networks, Fuzzy Systems, and Genetic Algorithms, 2020
Figure 13 shows an isolated word recognition system. The recurrent neural fuzzy technique is used for the recognition step. Speech signals are coded using LPC cepstrum and vector quantization (VQ) is used. A VQ codebook size of 256, 9 pole LPC, 16 KHz sampling rate (with 16 bit speech amplitudes), and 300 samples per frame are used. The TIMIT [3] data base (a speech database developed by Texas Instruments and Massachusetts Institute of Technology, and sponsored by United States Defense Advanced Research Projects Agency) is used to develop the code book as well as to train the RNFS. The recurrent neural net is trained with about 200 speakers from different U.S. regions using 11 words from SA (dialect) sentences. Testing is also done using the TIMIT database (using speakers from both test and train directories). The recognition accuracy is 90%, comparable to HMM based recognition.
Voiceprint-Based Biometric Template Identifications
Published in Karm Veer Arya, Robin Singh Bhadoria, The Biometric Computing, 2019
Akella Amarendra Babu, Sridevi Tumula, Yellasiri Ramadevi
NTIMIT database consists of same utterances as TIMIT but transmitted through the actual telephone channel which limits the bandwidth to 3.3 kHz and has additive noise. TIMIT speech corpus consists of connected word speech utterances and is popularly used in speech recognition research. Distribution of the speakers is shown in Table 3.5.
Quadratic Time–Frequency Features for Speech Recognition
Published in Antonia Papandreou-Suppappola, Applications in Time-Frequency Signal Processing, 2018
In the TIMIT corpus, the core test set consists of 24 speakers, 2 males and 1 female from each of the 8 dialect regions. There are a total of 168 speakers in the full test set, and a total of 462 speakers in the training set.
Fast Robust Location and Scatter Estimation: A Depth-based Method
Published in Technometrics, 2023
Maoyu Zhang, Yan Song, Wenlin Dai
In the second example, we detect outliers for a phoneme dataset, which comes from a speech recognition database TIMIT and has been discussed in Hastie, Tibshirani, and Friedman (2009). The data includes 1050 speech frames, 1000 of which are “ao” and 50 of which are “iy”. Each data frame has been transformed to be a log-periodogram of length 256. First, we reduce the dimensions by smoothing splines. For each sample, we replace the original variables with 50-dimensional variables , where is the basis matrix of natural cubic splines. We use 50 basis functions with knots uniformly placed over . To this end, we are dealing with data of n = 1050 and p = 50.
VEP Detection for Read, Extempore and Conversation Speech
Published in IETE Journal of Research, 2022
Kumud Tripathi, K. Sreenivasa Rao
In this work, we have proposed a novel method to accurately detect the VEPs in any mode of a speech signal. The proposed method is performed at two stages for robust detection of VEPs. At the first stage, VOPs are detected using our recently proposed method [16]. At the second stage, phone boundaries are marked and the closest succeeding phone boundary for each detected VOP is considered as the detected VEP. The proposed method is significant for segregating the vowel end points from the remaining speech regions. To validate this fact, the proposed method is compared with signal processing techniques reported in [6,7] using TIMIT speech corpus and read, conversation, and extempore modes of Bengali speech corpus. In addition to that, the importance of the proposed method is shown by detecting vowel regions for considered modes of speech.
Refinement of HMM Model Parameters for Punjabi Automatic Speech Recognition (PASR) System
Published in IETE Journal of Research, 2018
Virender Kadyan, Archana Mantri, R. K. Aggarwal
A set (1 male and 1 female) of 5000 most frequent words are considered as a benchmark corpus for our research [29]. The repetition of these 5000 words is done by four set of speakers, i.e. SET1 speakers from Malwa dialect, SET2 speakers from Majha dialect, SET3 speakers from Doaba dialect, and SET4 speakers from Powadh dialect (a total of 15 male and 9 female speakers of different dialect of Punjabi lie in the age group of 17–30 years). For other languages like European, there exist standard speech corpus like TIMIT acoustic-phonetic continuous speech corpus, Air Travel Information System (ATIS) pilot corpus, Linguistic Data Consortium (LDC), and Nationwide Speech Project (NSP) [30], but the major problem in our Indian language is the unavailability of a standard speech and text corpus. The corpus collection in our work is done in-house radio studio with the help of a microphone keeping at a distance of minimum 10 cm. The speech corpus is collected as per the standard laid down in the manual of enabling minority language engineering (EMILLE) text corpus [30]. Verification of the self-created corpus is done by TDIL. The speech signal collected from different dialects of Punjabi language is read speech, that is, a text transcription in the Punjabi language is provided to each speaker. It avoids misspelling errors occurred due to pronunciation effect of the same word in each dialect of Punjabi language by providing text in it.