Explore chapters and articles related to this topic
Edge Analytics
Published in Chandrasekar Vuppalapati, Building Enterprise IoT Applications, 2019
The Voice samples were obtained from a voice clinic in a tertiary teaching hospital (Far Eastern Memorial Hospital, FEMH), these included 50 normal voice samples and 150 samples of common voice disorders, including vocal nodules, polyps, and cysts (collectively referred to as Phono trauma); glottis neoplasm; unilateral vocal paralysis. Voice samples of a 3-second sustained vowel sound /a:/ were recorded at a comfortable level of loudness, with a microphone-to-mouth distance of approximately 15–20 cm, using a high-quality microphone (Model: SM58, SHURE, IL), with a digital amplifier (Model: X2u, SHURE) under a background noise level between 40 and 45 dBA. The sampling rate was 44,100 Hz with a 16-bit resolution, and data were saved in an uncompressed .wav format (see Figure 43).
Principal characteristics of speech
Published in Sadaoki Furui, Digital Speech Processing, Synthesis, and Recognition, 2018
The speech production process involves three subprocesses: source generation, articulation, and radiation. The human vocal organ complex consists of the lungs, trachea, larynx, pharynx, and nasal and oral cavities. Together these form a connected tube as indicated in Fig. 2.2. The upper portion beginning with the larynx is called the vocal tract, which is changeable into various shapes by moving the jaw, tongue, lips, and other internal parts. The nasal cavity is separated from the pharynx and oral cavity by raising the velum or soft palate.
Automatic Speech Recognition
Published in K. S. Fu, Pattern Recognition, 2019
A speech sound is the result of a signal generated by the vibration of the vocal chords or by some noise source and modified by its passage through the vocal tract and, for some sounds, the nostrils. The main components of the vocal tract are the lips, the tongue, the lower jaw, and the velum; the velum is the valve which closes off the nasal tract; all these components are movable, making the vocal tract assume a variety of configurations, some of them corresponding to speech sounds.
On the variation of fricative airflow dynamics with vocal tract geometry and speech loudness
Published in Aerosol Science and Technology, 2022
Amir A. Mofakham, Brian T. Helenbrook, Byron D. Erath, Andrea R. Ferro, Tanvir Ahmed, Deborah M. Brown, Goodarz Ahmadi
Voiced speech is produced as lung pressure drives airflow through the vocal folds, which are located in the larynx, inciting self-sustained oscillations. The resultant unsteady modulation of the flow creates an oscillatory pressure field that acts as an acoustic source. Different sounds are produced by altering the vocal tract shape, thereby changing the resonances. Unvoiced speech sounds, classified as such because the vocal folds do not oscillate, are produced due to purely aerodynamic sound sources within the vocal tract. For example, fricatives are consonant sounds that are produced by passing air through a partial restriction in the vocal tract made by placing two articulators close together (Stevens 1999), as occurs during the pronunciation of [f]; a labiodental fricative whose constriction is generated as the lower lip is located against the upper teeth. Similarly, is a dental fricative generated by locating the tip of the tongue against the upper teeth (Isshiki and Ringel 1964). With respect to speech as a modality for transport of infectious disease, fricatives are of interest because, during pronunciation, the constrictions created at the mouth result in increased airflow velocities (Yoshinaga, Nozaki, and Wada 2019b; Pont et al. 2019).
Impulsive Behavior Detection System Using Machine Learning and IoT
Published in IETE Journal of Research, 2021
Soumya Jyoti Raychaudhuri, Soumya Manjunath, Chithra Priya Srinivasan, N. Swathi, S. Sushma, Nitin Bhushan K. N., C. Narendra Babu
Various algorithms have been designed by the researchers based on the variations in the facial muscles, gestures and variations in the speech and voice signals to recognize speaker, emotions and the content. In Reference [14], stress levels were extracted using facial expressions, gestures, voluntary and involuntary changes in eyes, mouth, movement of head through videos and images. Sound is produced by pockets of air emanated by resonance of the vocal cords and transformed into speech by the vocal tract. Changing levels of stress vary the patterns of respiration in turn causing changes in pitch level, pronunciation and expression patterns in the speech [15].