Explore chapters and articles related to this topic
Speech Recognition Fundamentals and Features
Published in Vishal Jain, Akash Tayal, Jaspreet Singh, Arun Solanki, Cognitive Computing Systems, 2021
Gurpreet Kaur, Mohit Srivastava, Amod Kumar
It is essential to know the individual speech construction model for developing the speech model as one can then extract the speech features more accurately. A physiological model of human speech is shown in Figure 13.6. Glottis is the opening of the vocal tract. The vocal tract is like a cavity, which is having two openings: nostrils and lips. Velum is an articulator organ, which moves up and down, and it is responsible for the radiation of sound either from the nose or through lips. The flow of air is controlled by vocal cords. The vocal cords open and close accordingly. Effectively, it is pulse-like excitation that is given to vocal tract. Vocal tract acts as a filter. Whatever pulses are fed to it, spectral shaping of those pulses is done by the vocal tract. Spectral shaping varies from time to time based upon the utterance of words. The speech signal is quasi-periodic waveform, that is, statistical parameters remain the same within a short time interval of order 10 to 20 ms. Therefore, the short-time analysis of speech signals is required. Different sounds are produced with the help of articulators organs such as velum. When velum moves downward, the air passage is blocked from vocal tract region up to lips, and hence, nasal sounds are emitted. When the movement of velum is in an upward direction, the nostril passage is closed, and voiced sounds are produced through lips. The tongue is another organ that is responsible for different sounds [10].
Introduction to Anatomy and Physiology
Published in Reginald L. Campbell, Roland E. Langford, Terry L. McArthur, Fundamentals of Hazardous Materials Incidents, 2020
Reginald L. Campbell, Roland E. Langford, Terry L. McArthur
Air entering the nose is warmed and filtered by a series of bony shelves that are covered with mucous membranes. Larger foreign particles are filtered out by the nasal hairs and mucous plates. The air then passes through the pharynx and past the glottis, a flap that prevents food from the mouth from entering the respiratory system. The air then passes through the larynx, which contains the vocal cords, into the largest of the respiratory passages, the trachea. The trachea is a large tube, about 2 cm (1 inch) in diameter, ringed with bands of cartilage that prevent it from collapsing. All these passages are lined with cilia, which are small hairlike projections that beat like small brooms to sweep foreign materials up and out of the respiratory tract.
Source Coding for Audio and Speech Signals
Published in Rajeshree Raut, Ranjit Sawant, Shriraghavan Madbushi, Cognitive Radio, 2020
Rajeshree Raut, Ranjit Sawant, Shriraghavan Madbushi
LPC is one of the most powerful and useful speech analysis techniques for encoding good quality speech at a low bit rate. It provides extremely accurate estimates of speech parameters and is relatively efficient for computation. LPC starts with the assumption that the speech signal is produced by a buzzer at the end of a tube. The glottis (space between the vocal cords) produces the buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (throat and mouth) forms the tube, which is characterized by its resonances, called as formants. LPC analyzes the speech signal by estimating formants, removing their effects from the speech signal, and estimating the speech intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal is called residue. Because speech signal vary with time, this process is done on short chunks of speech signal, called as frames. The basic problem of LPC system is to determine the formants from the speech signal. The solution used is the difference equation, which expresses each sample of signal as a linear combination of previous sample. Such equation is called as a linear prediction, and hence the name given is Linear Prediction Coding.
Investigation of airflow at different activity conditions in a realistic model of human upper respiratory tract
Published in Computer Methods in Biomechanics and Biomedical Engineering, 2021
Reza Tabe, Roohollah Rafee, Mohammad Sadegh Valipour, Goodarz Ahmadi
The pressure difference causes the movement of air into and out of the respiratory system. Through the inhalation, the glottis at the larynx unfolds, permitting the airflow to penetrate from the upper airways toward the lung. Figure 8 compares the pressure drops of the present model with five other tracheal geometries in the study of Bates et al. (2016). Case A is normal; B and C include both constriction and curvature, while D and E show increased curvature from the normal model, but without stenosis. Here, the pressure drop is defined as the pressure difference between the glottis and above the carina and is presented in a logarithmic plot. As can be seen from Figure 8, the pressure drop of the present model is located between the curved and constricted case (B and C) and the curved case (E) of Bates et al. (2016). It should be noted that the simplified models (A and D) do not have the geometric complexity of the airflow paths. Therefore, they do not provide a good estimation of flow parameters. The oral-trachea model has a comparatively high level of pressure drop at low flow rates, but as the airflow enhanced, the resistance increases more gradually. Bates et al. (2016) have reported that the value of the exponent b for geometries B, C, and E is similar (1.85) which corresponds with the value obtained for b in Equation (15).
Vocal cord abnormal voice flow field study by modeling a bionic vocal system
Published in Advanced Robotics, 2020
Xiaojun Zhang, Yan Wang, Wei Zhao, Wei Wei, Zhi Tao, Heming Zhao
The sound source at the glottis is the result of the interaction between the airflow and vibration of the vocal cords. Titze [7] first proposed that the threshold pressure of vocal cords is an important parameter of aerodynamics, and then analyzed the small amplitude vibration of vocal cords. The expression of the threshold pressure was then modified [8]. The main energy transfer mechanism for maintaining self-excited vibration induced by airflow is the action of the mucosal wave and the inertial impedance of the vocal tract. Lucero [9] considered the outcomes of increasing the viscous effect when the glottic diameter is small, and extended the situation of small amplitude vibration [10]. A pressure equation for vocal cords was proposed as the basis for its biomechanical parameters, and the frequency of vocal cord vibration was taken as a feature when predicting changes to the pressure threshold when vocal cord vibration occurs.
Simplified cerebellum-like spiking neural network as short-range timing function for the talking robot
Published in Connection Science, 2018
The talking robot consists of an air pump, an artificial vocal cord, a resonance tube, an artificial nasal cavity, and a microphone connected to a sound analyser, which represents the lung, vocal cord, vocal tract, nasal cavity and auditory feedback of a human, respectively. An air compressor provides the airflow for the talking robot. The airflow is directed to the vocal cords via a pressure control valve and two control valves. These work as controllers for the volume of voiced sound and unvoiced sound. The resonance tube works as a vocal tract and is attached to the artificial vocal cord for the manipulation of resonance characteristics. The nasal cavity is connected to the resonance tube with a rotary valve between them. The microphone and amplifier acts as an auditory system for the feedback during speech. The relation between the voice characteristics and the motor control commands are stored in the system controller, which is referred to the generation of speech articulatory motion. The overview of the system is shown in Figure 1. The characteristics of a glottal wave, which determines the pitch and the volume of the human voice is governed by the complex behaviour of the vocal cord-the oscillatory mechanism of human organs consisting of a mucous membrane and muscles excited by airflow from the lung. The vibration of a 5 mm width rubber band attached to a plastic body creates sound from the artificial vocal cord.