Explore chapters and articles related to this topic
An Introduction to Biometrics
Published in Richard C. Dorf, Circuits, Signals, and Speech and Image Processing, 2018
Robert W. Ives, Delores M. Etter
Speech is produced via the vocal tract. It is the shape of the vocal tract that makes the voice unique and suitable for use in speaker identification. The vibration of the vocal cords, as well as the positions, shapes, and sizes of the various articulators (lips, tongue, etc.) change over time to produce the sound. The characteristics of the sound vary from person to person, and can be used to identify an individual. For example, Figure 24.5 shows the speech waveform from three different people speaking the word “Honolulu.” A person’s voice is not necessarily stable over a lifespan, varying with age and in the presence of disease. It can also vary over the short term, in the presence of stress, colds, and allergies. Voice recognition is occasionally confused with the technology of speech recognition. In the latter, an algorithm translates what a user is saying, whereas voice recognition technology verifies the identity of the individual who is speaking.
Speech Production and Perception
Published in Philipos C. Loizou, Speech Enhancement, 2013
We mentioned earlier that the vocal tract can be viewed as a filter that spectrally shapes the flow wave coming from the vocal folds to produce various sounds. The vocal folds provide the excitation to the vocal tract, and that excitation can be periodic or aperiodic, depending on the state of the vocal folds. Voiced sounds (e.g., vowels) are produced when the vocal folds are in the voicing state (and vibrate), whereas unvoiced sounds (e.g., consonants) are produced when the vocal folds are in the unvoicing state. These facts about the roles of the vocal tract and vocal folds led researchers to develop an engineering model of speech production (see Figure 3.8). In this model, the vocal tract is represented by a quasi-linear system that is excited by either a periodic or aperiodic source, depending on the state of the vocal folds. The output of this model is the speech signal, which is the only signal that we can measure accurately.
Audio compression
Published in David Austerberry, The Technology of Video and Audio Streaming, 2013
The vocal chords produce a buzzing noise. The fundamental is about 100 Hz for adult men and 200 Hz for women. The tone is rich in harmonics (overtones). The vocal tract then acts as a filter on this tone, producing vowel sounds (consonants or ‘plosives’ are formed by controlled interruptions of exhaled air). The vocal tract has three cavities: the pharynx, or back of the throat; the nasal cavity; and the oral cavity (mouth). The resonant frequencies, or formants, are changed by muscle actions in the jaw, tongue, lips, and soft palate. This gives voicing to the basic tones from the vocal chords. The nasal cavity acts as a parallel filter.
On the variation of fricative airflow dynamics with vocal tract geometry and speech loudness
Published in Aerosol Science and Technology, 2022
Amir A. Mofakham, Brian T. Helenbrook, Byron D. Erath, Andrea R. Ferro, Tanvir Ahmed, Deborah M. Brown, Goodarz Ahmadi
Voiced speech is produced as lung pressure drives airflow through the vocal folds, which are located in the larynx, inciting self-sustained oscillations. The resultant unsteady modulation of the flow creates an oscillatory pressure field that acts as an acoustic source. Different sounds are produced by altering the vocal tract shape, thereby changing the resonances. Unvoiced speech sounds, classified as such because the vocal folds do not oscillate, are produced due to purely aerodynamic sound sources within the vocal tract. For example, fricatives are consonant sounds that are produced by passing air through a partial restriction in the vocal tract made by placing two articulators close together (Stevens 1999), as occurs during the pronunciation of [f]; a labiodental fricative whose constriction is generated as the lower lip is located against the upper teeth. Similarly, is a dental fricative generated by locating the tip of the tongue against the upper teeth (Isshiki and Ringel 1964). With respect to speech as a modality for transport of infectious disease, fricatives are of interest because, during pronunciation, the constrictions created at the mouth result in increased airflow velocities (Yoshinaga, Nozaki, and Wada 2019b; Pont et al. 2019).
Acoustic–Phonetic Analysis for Speech Recognition: A Review
Published in IETE Technical Review, 2018
Biswajit Dev Sarma, S. R. Mahadeva Prasanna
Sounds have acoustic resonating frequency depending upon the shape of vocal tract. Resonating frequencies are called as formants. The shape of the vocal tract resonator can be approximated to a uniform tube while producing the vowel schwa. While producing this vowel by an adult male with vocal tract length 17 cm, will have the first (F1), second (F2), and third (F3) formants around 500, 1500, and 2500 Hz, respectively. Any constriction or opening in the vocal tract will change the formant frequencies for the same speaker. Different vowels are produced by making different shapes in the vocal tract and hence, the formant structure will be different from one vowel to the other. Formant transitions are present in the transition region between consonant and vowel and also among the vowels while producing diphthongs and triphthongs. Consonants are produced in different places of articulation with different degrees of constriction. Formant contour from the consonant to the vowel will be different depending upon the nature of the consonant [8].
Impulsive Behavior Detection System Using Machine Learning and IoT
Published in IETE Journal of Research, 2021
Soumya Jyoti Raychaudhuri, Soumya Manjunath, Chithra Priya Srinivasan, N. Swathi, S. Sushma, Nitin Bhushan K. N., C. Narendra Babu
Various algorithms have been designed by the researchers based on the variations in the facial muscles, gestures and variations in the speech and voice signals to recognize speaker, emotions and the content. In Reference [14], stress levels were extracted using facial expressions, gestures, voluntary and involuntary changes in eyes, mouth, movement of head through videos and images. Sound is produced by pockets of air emanated by resonance of the vocal cords and transformed into speech by the vocal tract. Changing levels of stress vary the patterns of respiration in turn causing changes in pitch level, pronunciation and expression patterns in the speech [15].