Speech and its perception
Stanley A. Gelfand in Hearing, 2017
The six stops are produced at three locations. The bilabials (/p, b/) are produced by an obstruction at the lips, the alveolars (/t, d/) by the tongue tip against the upper gum ridge, and the velars (/k, g/) by the tongue dorsum against the soft palate. Whether the sound is heard as voiced or voiceless is, of course, ultimately due to whether there is vocal cord vibration; however, cues differ according to the location of the stop in an utterance. The essential voicing cue for initial stops is voice onset time (VOT), which is simply the time delay between the onset of the stop burst and commencement of vocal cord vibration (Lisker and Abramson, 1964, 1967). In general, voicing onset precedes or accompanies stop burst onset for voiced stops but lags behind the stop burst for voiceless stops. For final stops and those which occur medially within an utterance, the essential voicing cue appears to be the duration of the preceding vowel (Raphael, 1972). Longer vowel durations are associated with the perception that the following stop is voiced. Voiceless stops are also associated with longer closure durations (Lisker, 1957a), faster formant transitions (Slis, 1970), greater burst intensities (Halle et al., 1957), and somewhat higher fundamental frequencies (Haggard et al., 1970) than voiced stops.
All about Foreign Accent Syndrome
Jack Ryalls, Nick Miller in Foreign Accent Syndromes, 2014
For the initial speech motor examination, one examines control of respiration for speech. Voice production is measured with perceptual rating scales and may also be backed up by acoustic measures. Prolonged vowels and speech tasks are employed. Detailed assessment of prosody is likely to be of central interest. Articulation is evaluated with diadochokinetic (i.e. reiterative) speech tasks to look at variables such as (stability of) voice onset time, ability to sustain range and rate of movement, differential effects of weakness or incoordination at different sites between for instance lips, tongue tip, and tongue back.
Vocal Motor Disorders *
Rolland S. Parker in Concussive Brain Trauma, 2016
A phoneme is a single distinct sound, the minimal sound unit that contrasts meaning and defines a word in a language (e.g., /p/ and /b/). Phonemes consist of distinctive features of sound production (i.e., voicing, aspiration, roundedness, and the location and degree of maximal constriction of the vocal tract creating pitch). In a given language, some sequences are permitted while others are forbidden (Blumstein, 1991; Caplan et al., 1999). Air has to go through the larynx either whispered or voiced. Voiced is defined as sounds produced with vocal cord vibrations (/b/), as contrasted with voiceless air (t/p/,/s/) (i.e., without vibration of the vocal cords). For voiceless consonants, the vocal cords vibrate 30 msec after the stop consonant is released (s/). Phonemes are formed by the location and the maximal constriction of the vocal tract, as well as voicing (glottal or laryngeal vibrations) (i.e., speech sounds produced by vibration of the vocal cords with the opening between them, as b/d/c). A glottal stop is a speech sound made by the closure and then explosive release of the glottis. Opening and closing of the velopharyngeal port is required to produce appropriate nasal and oral resonance of speech and the intraoral pressures necessary for the articulation of phonemes, as well as to affect prosody and articulation in dysarthritic speakers. Dysfunction may result after lesions to the upper motor neurons (UMNs) that supply the bulbar region of the brainstem, and the lower motor neurons (LMNs) that supply muscles of the soft palate and pharynx, and subcortical structures such as the basal ganglia and cerebellum (Theodoros & Murdoch, 2001a). Voice-onset time is timing between the release of a stop consonant and the onset of glottal pulsing. Anterior patients have difficulty with phonetic dimensions requiring the timing of two independent articulators (Blumstein, 1991). The brain processes complex acoustic information and identifies a phoneme based on known categories of speech signals (Fitch et al., 1997). Contrast between related sounds involves both voicing and the place of articulation.
Temporal characteristics of stop consonants in pediatric cochlear implant users
Published in Cochlear Implants International, 2019
Stops are the most common consonants which occur in all human languages (Ladefoged and Maddieson, 1996) and are produced by the complete occlusion of the cavity by the articulators followed by a release. Acoustic events of stops comprise of frequency related parameters which include burst frequency, formant transition, and temporal parameters such as Voice Onset Time (VOT), Burst Duration (BD), and Closure Duration (CD). Among the temporal parameters, VOT has been widely studied in TDC across languages (Lisker and Abrahamson, 1964; Savithri, 1996; Shukla, 1989; Sridevi, 1990). Voice Onset Time (VOT) is the time difference between the onset of articulatory release and the onset of voicing and is considered as a major cue for differentiating prevocalic stops along the voicing dimension (Lisker and Abrahamson, 1964). Studies in English and Dravidian languages like Kannada, Malayalam, and Tamil have revealed that voiceless plosives have longer VOT compared to voiced plosives (Docherty, 1992; Klatt, 1975; Lisker and Abramson, 1964, 1967; Shukla, 1989; Savithri et al., 2001). VOT values differ according to the place of articulation. In English, among the three primary places of articulation, i.e. bilabial, alveolar, and velar, the velar plosives exhibit the longest VOT, whereas bilabials have the shortest (Smith, 1978). In contrast, Dravidian languages, i.e. languages predominantly spoken in the southern region of India, had the longest VOT for velars followed by bilabials and alveolars (Savithri et al., 2001).
Assessing automatic VOT annotation using unimpaired and impaired speech
Published in International Journal of Speech-Language Pathology, 2018
Esteban Buz, Adam Buchwald, Tzeviya Fuchs, Joseph Keshet
In empirical studies of normal and impaired speech production, measuring aspects of the acoustic or articulatory signal can be critical to understanding a pattern but is a bottleneck to completing the research. In this article, we consider measurement of voice onset time (VOT), one of the most highly studied speech measurements in both unimpaired and impaired speakers. VOT represents the duration between the release of a stop consonant and the onset of voicing that follows, and is the primary acoustic cue for encoding the voicing contrast cross-linguistically (Lisker & Abramson, 1964). This measure has existed for decades to study seminal issues such as cross linguistic differences in speech (Lisker & Abramson, 1964) and how infants encode speech sound contrasts (Eimas, Siqueland, Jusczyk, & Vigorito, 1971). In addition to studies on the contrastive properties of VOT, the continuous nature of this measure has been widely used to study the gradient nature of speech production (Baese-Berk & Goldrick, 2009; Buz 2016). We highlight one specific annotation tool, AutoVOT (Sonderegger & Keshet, 2012), and examine how it compares to traditional manual annotation of VOT based on a reanalysis of data from two articles: Buz (2016) who examined a large corpus of VOTs produced by unimpaired speakers; and Buchwald and Miozzo (2011) who studied systematic differences in how impaired speakers produced VOT in both accurate and error productions.
Cortical Responses to Chinese Phonemes in Preschoolers Predict Their Literacy Skills at School Age
Published in Developmental Neuropsychology, 2018
Tian Hong, Lan Shuai, Stephen J Frost, Nicole Landi, Kenneth R Pugh, Hua Shu
Two pairs of Chinese monosyllables were used as stimuli, which were read by a male native Chinese speaker: lexical tone pairs /ji1/ and /ji4/ (Tone 1, the high level tone; and Tone 4, the high-falling tone) and consonant pairs /ba1/ and /ta1/. The lexical tone pairs differed in pitch contour (in fundamental frequency, F0). The acoustic features of F0 in /i1/ were 191.7 Hz and those in /i4/ were onset = 203.6 Hz, end point = 135.6 Hz. The consonant pairs differed in the initial voice onset time (VOT) of consonants. The acoustic features of VOT in /ba1/ were 12.4 msec and those in /ta/ were 42.4 msec. And the first formant (F1) and the second formant (F2) for stimuli /i/ were 340.2/1,907.4 and 808.9 Hz/1,122.3 Hz for /a/. These monosyllables were digitally edited using Sound-Forge (SoundForge9; Sony Corporation, Tokyo, Japan) to have 140 ms duration and 70 dB.
Related Knowledge Centers
- Phonetics
- Vocal Cords
- Phonation
- Plosive
- Voice
- Tenuis Consonant
- Sonorant
- Aspirated Consonant
- Pre-Voicing