Explore chapters and articles related to this topic
Voice and Speech Production
Published in John C Watkinson, Raymond W Clarke, Terry M Jones, Vinidh Paleri, Nicholas White, Tim Woolford, Head & Neck Surgery Plastic Surgery, 2018
Paul Carding, Lesley Mathieson
Changing the shape of the vocal tract subsequently changes its resonating behaviour; different shapes responding to different components of the harmonic structure of the glottal sound source. The resonance peaks of the vocal tract are called formants. These formant structures vary for each vowel and are easily identifiable on a sound spectrograph.5 It is possible to distinguish between vowels by changing (i) the height of the tongue raising in the mouth, (ii) the part of the tongue which is raised (front, centre, back) and (iii) the position of the lips (spread or rounded). For example, the vowel/i:/as in ‘see’ is made with the front of the tongue raised and with lips spread. In contrast, the vowel/u:/as in ‘sue’ is made with back tongue raising and rounded lips. Diphthongs (i.e. ‘beer’, ‘air’) start with one oral tract articulatory shaping and glide to another.
Speech and its perception
Published in Stanley A. Gelfand, Hearing, 2017
Our discussion of speech sounds will be facilitated by reference to the simplified schematic diagram of the vocal tract in Figure 14.1. The power source is the air in the lungs, which is directed up and out under the control of the respiratory musculature. Voiced sounds are produced when the vocal folds (vocal cords) are vibrated. The result of this vibration is a periodic complex waveform made up of a fundamental frequency on the order of 100 Hz in males and 200 Hz in females, with as many as 40 harmonics of the fundamental represented in the waveform (Flanagan, 1958) (Figure 14.2a). Voiceless (unvoiced) sounds are produced by opening the airway between the vocal folds so that they do not vibrate. Voiceless sounds are aperiodic and noise-like, being produced by turbulences due to partial or complete obstruction of the vocal tract. Regardless of the source, the sound is then modified by the resonance characteristics of the vocal tract. In other words, the vocal tract constitutes a group of filters that are added together, and whose effect is to shape the spectrum of the waveform from the larynx. The resonance characteristics of the vocal tract (Figure 14.2b) are thus reflected in the speech spectrum (Figure 14.2c). The vocal tract resonances are called formants, and are generally labeled starting from the lowest as the first formant (F1), second formant (F2), third formant (F3), and so on. This is the essence of the source-filter theory, or the acoustic theory of speech production (Fant, 1970).
The role of the speech and language therapist
Published in James Barrett, Transsexual and Other Disorders of Gender Identity, 2017
The resonant pitches of vowels are known as formants and can be identified spectographically as F1, F2, etc. In studies by Gunzburger, F3 increased in frequency in male-to-female transsexuals using their female voice.19,20 A previous study indicates that F3 appears to be an important element of influencing listener judgements of gender.21 Gelfer and Schofield, found that female perceived subjects had consistently ‘. . . higher vowel formant frequencies for isolated productions of /i/ and /a/’.17
Quantifying articulatory impairments in neurodegenerative motor diseases: A scoping review and meta-analysis of interpretable acoustic features
Published in International Journal of Speech-Language Pathology, 2023
Hannah P. Rowe, Sanjana Shellikeri, Yana Yunusova, Karen V. Chenausky, Jordan R. Green
Among the five components, precision exhibited the greatest range in effect size values across all acoustic features, in part because there were significantly more features representing precision compared to any other component. As previously mentioned, features of precision assess the distinctiveness of vowels, consonants, and consonant-vowel transitions. For vowels, reductions in vowel space (e.g. vowel articulation index or vowel space area) reflect the inability to achieve the full range of movement required for distinct vowel production. For consonants, reductions in duration (e.g. unvoiced fricative duration or unvoiced stop duration) or intensity (e.g. first spectral moment coefficient or spectral change range) of spectral energy reflect the inability to maintain sufficient articulatory closure and/or, in the case of spectral moments, inaccurate place and degree of constriction. For consonant-vowel transitions, reductions in formant “movement” (e.g. second formant extent or second formant interquartile range) reflect the inability to achieve appropriate vocal tract movement between sounds. The concept of “articulatory imprecision” is, therefore, employed broadly across different features and articulatory gestures. The overall findings suggest that features reflecting changes in vowel space and in duration of spectral energy are promising for detecting articulatory imprecision across disease types. Further research is, nevertheless, needed to validate acoustic features of precision using kinematic approaches and to determine whether our findings hold with a larger and more diverse sample.
Voice source, formant frequencies and vocal tract shape in overtone singing. A case study
Published in Logopedics Phoniatrics Vocology, 2023
Johan Sundberg, Björn Lindblom, Anna-Maria Hefele
Inverse filtering is a classical method for analyzing the glottal sound source, and it has been extensively used for more than half a century, see e.g. [11–13]. The method, which is reliable for the typical fo of adult male and female speech, is schematically illustrated in Figure 3. The basic idea is to remove the effect of the formants, so as to obtain a “residue” waveform representing the pulsating glottal airflow input signal of the vocal tract. By filtering the radiated vowel signal with filters representing the inverted (upside-down) version of the formant curves in Figure 3, the amplifying effect of the formant peaks is removed. Accurate tuning of each inverse filter ideally produces an output signal meeting two criteria: a waveform characterized by a ripple-free closed phase and a spectrum with an envelope void of troughs and peaks near the formant frequencies. In the waveform, referred to as the flow glottogram, the closed phase of the vocal fold vibration appears as a flat or sometimes somewhat tilted segment, surrounding quasi-triangular pulses that correspond to the open phase. A tilting closed phase can result from a piston motion of the glottal plane [14]. The steepness of the trailing end of the pulses, i.e. rate of flow decrease during the closing phase, varies systematically with vocal loudness. The peak-to-peak amplitude of the pulses varies with phonation type.
Relationship between epilarynx tube shape and the radiated sound pressure level during phonation is gender specific
Published in Logopedics Phoniatrics Vocology, 2023
Alexander Mainka, Ivan Platzek, Anna Klimova, Willy Mattheus, Mario Fleischer, Dirk Mürbe
Another aspect of the perceived loudness of singing is the influence of formants. Bartholomew observed that opera singers produced vowel spectra containing a marked envelope peak near 2.8 kHz, which he ascribed to the larynx tube [7]. Mostly, it is referred to as the singers’ formant cluster. Male singers use articulatory strategies to boost the energy in that frequency region. In this regard, the lowering of larynx and the widening of pharynx are considered crucial mechanisms. In contrast, female singers are adjusting the first two formants when increasing loudness [8]. Based on modelling approaches the VT can contribute an acoustic gain of 10–15 dB [2]. Increase of vocal intensity is typically associated with an increase of formant levels. For adult speakers this holds true only up to 6–8 dB below the personal maximum SPL [9]. According to Cleveland and Sundberg [10] and Sjolander et al. [11], the increase of the level of the third formant (LR3) is roughly 1.4–1.6 times greater than the increase of the level of the first formant (LR1). Within these articles, the peaks of the envelope of sound spectra are defined as formants. For definition and nomenclature of formants see [12] and citations.