Explore chapters and articles related to this topic
Audio Codecs
Published in Francis F. Li, Trevor J. Cox, Digital Signal Processing in Audio and Acoustical Engineering, 2019
Many audio codecs have been developed; there are still continuous efforts to develop new and even better ones. The performance of these codecs varies in different application scenarios. An analogue or digital converter (ADC) and a digital to analogue converter (DAC) are always needed in any codec, since they link the analogue to the digital signals. The simplest and essential codec is literally the ADC and DAC, and such an encoding method is the well-established pulse-code modulation (PCM), in which the amplitude of audio signals are sampled at uniform intervals and quantized linearly to the nearest available values depending upon the number of bits. PCM encoding and decoding, used in many applications such as the CD encoding format, can be viewed as the “raw” format of digital audio. The sampling rate of PCM encoding determines the frequency range, and the number of bits determines the quantization error, which presents as quantization noise. More sophisticated audio codecs have compression schemes built in. Compression enables the significant reduction of data used to represent the audio signals, and facilitates the storage and transmission of audio information.
D/A and A/D Converters
Published in Jerry C. Whitaker, Microelectronics, 2018
PCM is a technique where an analog signal is sampled, quantized, and then encoded as a digital word. The PCM IC can include successive approximation techniques or other techniques to accomplish the PCM encoding. In addition, the PCM codec may employ nonlinear data compression techniques, such as companding, if it is necessary to minimize the number of bits in the output digital code. Companding is a logarithmic technique used to compress a code to fewer bits before transmission. The inverse logarithmic function is then used to expand the code to its original number of bits before converting it to the analog signal. Companding is typically used in telecommunications transmission systems to minimize data transmission rates without degrading the resolution of low-amplitude signals. Two standardized companding techniques are used extensively: A-law and μ-law. The A-law companding is used in Europe, whereas the μ-law is used predominantly in the United States and Japan. Linear PCM conversion is used in high-fidelity audio systems to preserve the integrity of the audio signal throughout the entire analog range.
Speech Coding
Published in Sadaoki Furui, Digital Speech Processing, Synthesis, and Recognition, 2018
The simplest waveform coding method is linear pulse code modulation (PCM). In this method, analog signals are quantized in homogeneous steps similar to the usual A/D conversion. This method does not compress the information rate, since it use no speech-specific characteristics. When the quantization step size and the range of signal amplitude are indicated by Δ and L, respectively, the quantization bits B must satisfy Δ 2B ≥ L or B > log2(L/Δ) (See Sec. 4.1.2). Since the SNR of a PCM signal quantized by B bits is roughly 6B-7.2 [dB] (Eq. (4.9)), the number of bits B must be decided so that the SNR of the quantized signal is larger than that of the signal before quantization. For example, a bit rate of roughly 100 kbps, in other words, 8-kHz sampling and 13-bit quantization, is necessary for quantizing 4-kHz-bandwidth telephone speech by linear PCM without producing detectable distortion arising from the quantization noise.
Automatic Speech Recognition Using Limited Vocabulary: A Survey
Published in Applied Artificial Intelligence, 2022
Jean Louis K. E Fendji, Diane C. M. Tala, Blaise O. Yenke, Marcellin Atemkeng
c.) Voice recording: Sounds are recorded using a microphone that matches the desired conditions. It should minimize differences between training conditions and test conditions. Speech recordings are generally performed in an anechoic room and are usually digitized at 20 kHz using 16 bits (Alumae and Vohandu 2004; Glasser 2019) or at 8 kHz (Tamgno et al. 2012) or 16 kHz (Hofe et al. 2013; Warden 2018). The waveform audio file format container with file extension .wav is generally used (Glasser 2019; Warden 2018). The WAV formats encoded to Pulse-Code Modulation (PCM) allow one to obtain an uncompressed and high-fidelity digital sound. Since these formats are easy to process in the pre-processing phase of speech recognition and for further processing, it is necessary to convert the audio files obtained after the recording (for instance OGG, WMA, MID, etc.) into WAV format.