Explore chapters and articles related to this topic
Speak Out: Turns Your Smartphone into a Microphone
Published in P. C. Thomas, Vishal John Mathai, Geevarghese Titus, Emerging Technologies for Sustainability, 2020
Ajima Saseendran, Akshitha Lakshmi, Aleena Jose, Gouri Gopan, Shiney Thomas
The number of samples of audio carried per second is called Sample rate, it is measured in Hz or kHz. The sample rate of this app is 16000 Hz. An audio file format is a file format for storing digital audio data on a computer system [10]. The audio format used in this app is WAVE (.wav). The uncompressed audio input from user is then sampled and encoded using Pulse–CodeModulation(PCM) 16bits. It is a method used to digitally represent sampled analog signals. In a PCM stream, the amplitude of the analog signal is sampled regularly at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps. This sampled audio is then buffered and is streamed to connected phone
Researching digital objects
Published in Catherine Dawson, A–Z of Digital Research Methods, 2019
Do you have a good understanding of digital object file types and formats? Examples include: Still image files such as text documents and photographs from original copies: DOC (document file);EPS (Encapsulated PostScript);GIF (Graphics Interchange Format);JPEG (Joint Photographic Experts Group);PDF (Portable Document Format);PNG (Portable Network Graphics);TIFF (Tagged Image File Format).Video files produced from either original analogue or digital video formats: ASF (Advanced System Format);AVI (Audio Video Interleaved);FLV (Flash Video);MOV (Apple QuickTime Movie);MP4 (MPEG-4 Part 14);MPEG-1 or MPEG-2 (Moving Picture Experts Group);WMV (Windows Media Video).Audio files produced from original analogue or digital audio formats: AAC (Advanced Audio Coding);AIFF (Audio Interchange File Format);MP3 (MPEG-1 Audio Layer 3);WAV (Waveform Audio File Format);WMA (Windows Media Audio).
Automatic Speech Recognition Using Limited Vocabulary: A Survey
Published in Applied Artificial Intelligence, 2022
Jean Louis K. E Fendji, Diane C. M. Tala, Blaise O. Yenke, Marcellin Atemkeng
c.) Voice recording: Sounds are recorded using a microphone that matches the desired conditions. It should minimize differences between training conditions and test conditions. Speech recordings are generally performed in an anechoic room and are usually digitized at 20 kHz using 16 bits (Alumae and Vohandu 2004; Glasser 2019) or at 8 kHz (Tamgno et al. 2012) or 16 kHz (Hofe et al. 2013; Warden 2018). The waveform audio file format container with file extension .wav is generally used (Glasser 2019; Warden 2018). The WAV formats encoded to Pulse-Code Modulation (PCM) allow one to obtain an uncompressed and high-fidelity digital sound. Since these formats are easy to process in the pre-processing phase of speech recognition and for further processing, it is necessary to convert the audio files obtained after the recording (for instance OGG, WMA, MID, etc.) into WAV format.
Assessing the accuracy of artificial intelligence enabled acoustic analytic technology on breath sounds in children
Published in Journal of Medical Engineering & Technology, 2022
Zai Ru Cheng, Huiyu Zhang, Biju Thomas, Yi Hua Tan, Oon Hoe Teoh, Arun Pugalenthi
A sensing device (Figure 1) was custom built to record breath sounds from paediatric patients aged 0–16 years. The sensor consisted of a commercially available professional stethoscope head modified with a microphone from a smartphone. The microphone had a sensitivity of 110 dB/mW and an impedance of 32 ohms with recording performance standards comparable to devices used in published studies for computerised respiratory sound analysis [13]. The microphone was housed inside the stethoscope head, which was then placed on the subject’s chest to record breath sounds. The sensing device had a 3.5 mm audio jack compatible with the smartphone. The subjects’ breath sound recordings were captured with the novel sensing device and recorded onto the smartphone device using the application “Easy Voice Recorder” and were stored in the smartphone in Waveform (WAV) audio file format. The audio files were then extracted from the device and compiled into a local repository of breath sound recordings. This repository was used for the development of an AI algorithm to automatically classify breath sounds in children.
Classification of thought evoked potentials for navigation and communication using multilayer neural network
Published in Journal of the Chinese Institute of Engineers, 2021
Sathees Kumar Nataraj, Paulraj M P, Sazali Bin Yaacob, Abdul Hamid Bin Adom
The EEG signals were then pre-processed to extract frequency band signals and spectral features. Although in different studies, various feature extraction methods (frequency-based features, time-domain-based features, and time-frequency-based features) and feature classifiers (linear and non-linear classifiers) have been used to enhance performance, extracting meaningful EEG features is perhaps the most important phenomenon in designing the IRCC. Therefore, the feature extraction technique based on Spectral Centroid was used in this analysis as it has been commonly used to process and extract features from speech signals and Audio (Waveform Audio File Format) validation systems and due to its robustness and ability to recognize the parsimonious representation of the spectral envelope (Paliwal 1998; Nicolson et al. 2018).