Explore chapters and articles related to this topic
Voice over Internet Protocol Networks
Published in Goff Hill, The Cable and Telecommunications Professionals' Reference, 2012
An additional feature that the statistical multiplexing of IP benefits from is that, if someone is not talking, in theory it is not necessary to transmit any packets. This is termed silence suppression. For typical speech communication this can give a saving of approximately 50 percent. To operate properly, silence suppression requires a good voice activity detector (VAD) and the injection of comfort noise so that a listener does not notice the change in background noise. VAD requires a complex algorithm to operate properly without clipping the end or beginning of speech. Good VAD comes with codecs such as G.729 and G.723.1; however, G.711 has no such facility and requires additional algorithmic support with reported variable success (James et al., 2004). Indeed, as a good VAD algorithm requires almost as much effort as one of the more complex speech codecs, the use of VAD with G.711 is questionable.
Applications of spatial sound and related problems
Published in Bosun Xie, Spatial Sound, 2023
From the point of auditory perception, various spatial sound techniques are theoretically applicable to speech communication, depending on practical requirements and costs. For speech communication with headphones, VAD is advantageous because its hardware is simple, and it requires only two independent signals (and therefore a low bandwidth for signal transmission). In addition, conventional headphone presentation is inclined to cause in-the-head localization and auditory fatigue for a long listening time. Incorporating VAD into speech communication can create natural auditory effects and easy auditory fatigue. For speech communication with loudspeakers, other spatial sound techniques may be needed.
Noise-Estimation Algorithms
Published in Philipos C. Loizou, Speech Enhancement, 2013
The process of discriminating between voice activity (i.e., speech presence) and silence (i.e., speech absence) is called voice activity detection (VAD). VAD algorithms typically extract some type of feature (e.g., short-time energy, zero-crossings) from the input signal that is in turn compared against a threshold value, usually determined during speech-absent periods. VAD algorithms generally output a binary decision on a frame-by-frame basis, where a frame may last approximately 20–40 ms. A segment of speech is declared to contain voice activity (VAD = 1) if the measured values exceed a predefined threshold, otherwise it is declared as noise (VAD = 0).
Speaker Verification from Short Utterance Perspective: A Review
Published in IETE Technical Review, 2018
Rohan Kumar Das, S. R. Mahadeva Prasanna
While it comes to feature selection techniques, normally energy-based voice activity detection (VAD) [25] is performed to get the features for the region of interest which contain speech from the speaker, discarding silence regions. However, in noisy conditions this performs poorly which leads to degrade the SV performance. Therefore, in order to handle these degraded conditions of speech, especially in noisy environments, several different feature selection techniques are proposed. Some of the different robust VAD methods can be seen as, periodicity-based VAD [26], statistical VAD [27], vowel and non-vowel like region selection [28,29], self-adaptive VAD [30], etc. After selecting the features from region of interest, it is necessary to normalize the features in order to nullify the common offset for channel/session compensation, which gives better speaker discrimination. With MFCC and LPCC features, the normalization is termed as cepstral mean subtraction or cepstral mean normalization (CMN) [31]. Furthermore, cepstral variance normalization is performed on top of it to normalize the features to fit zero mean unit variance distribution [32].
Identification and authentication of user voice using DNN features and i-vector
Published in Cogent Engineering, 2020
Kydyrbekova Aizat, Othman Mohamed, Mamyrbayev Orken, Akhmediyarova Ainur, Bagashar Zhumazhanov
Voice activity detection (VAD) is an important step in most speech processing applications, especially if background noise is present. The importance of VAD is related to the fact that it improves intelligibility and speech recognition. Since the speech utterances used in this work were recorded in a public place, the recorded utterances were subject to noise and other interference. As a result, the VAD algorithm is necessary to reduce the background noise and quiet epochs in statements, in order to prepare them for feature extraction.
Voice activity detection for audio signal of voyage data recorder using residue network and attention mechanism
Published in Ships and Offshore Structures, 2022
Weiwei Zhang, Xin Liu, Han Du, Qiaoling Zhang, Jiaxuan Yang
Voice activity detection identifies the presence of human voice in the audio which can be either clean or noisy. The VAD outputs one during the corresponding time interval when human voice is present, and zero on the contrary.