Explore chapters and articles related to this topic
A study of poultry realtime monitoring and automation techniques
Published in Arun Kumar Sinha, John Pradeep Darsy, Computer-Aided Developments: Electronics and Communication, 2019
A. Arun Gnana Raj, S. Margaret Amala, J. Gnana Jayanthi
Kun Qian and Zixing Zhang [8] has presented a novel framework for bird sounds classification from audio recordings. The framework has the following parts: detection and segmentation of syllables (units) from audio recordings, acoustic feature extraction from the syllables, feature dimension reduction, and training for classification and modification. Bird songs has been analysed using the p-centre to detect the ‘syllables’. The syllables are the units for the recognition task. The p-centre will work with low frequency (LF) signal filtering process. This is useful for Fourier transform to calculate the values of entropy, the average frequency, and the centroid with the rhythmic envelope. Then Open source toolkit openSMILE is to extract large scales of acoustic features from chunked units of analysis (the ‘syllables’). Finally, An Extreme Learning Machine (ELM) has been proposed for decision making due to its fast and accurate performance. Table I shows the list of all the extracted parameter is used for the analysis. ELM has been proved that it can achieve a higher recognition rate while being less time consuming when compared to Support Vector Machines (SVMs) or ‘conventional’ Neural Networks. Results is shown by increasing the number of bird species and demonstrate that the proposed system can achieve an excellent and robust performance scalable to different numbers of species (mean unweighted average recall of 93.82 %, 89.56 %, 85.30 %, and 83.12% corresponding to 20, 30, 40, and 50 species of birds, respectively).
Automatic voice emotion recognition of child-parent conversations in natural settings
Published in Behaviour & Information Technology, 2021
Effie Lai-Chong Law, Samaneh Soleimani, Dawn Watkins, Joanna Barwick
Another key factor determining the computation cost is the size of the acoustic feature set used to train as well as test classification models and apply them to new unseen data. To extract acoustic features from a database, the open-source openSMILE extraction tool (Eyben, Wöllmer, and Schuller 2010) is widely used. Our approach was to use the predefined openSMILE set Emo-Large with 6552 features (Pfister and Robinson 2010, August), the largest feature set known to date. For comparison, we used the minimalistic set of 88 acoustic features known as eGeMAPS (Eyben et al. 2016), which was identified to address two issues of large feature sets. First, large brute-force feature sets tend to over-adapt classifiers to the training data in machine learning problems, reducing their generalizability to unseen (test) data (Schuller et al. 2010a). Second, the interpretation of the underlying mechanisms of thousands of features is very difficult, if not impossible (Eyben et al. 2016).
Expressing reactive emotion based on multimodal emotion recognition for natural conversation in human–robot interaction*
Published in Advanced Robotics, 2019
Yuanchao Li, Carlos Toshinori Ishi, Koji Inoue, Shizuka Nakamura, Tatsuya Kawahara
When it comes to natural conversation in HRI, recognizing the user's emotion is a task that has been researched for decades. The study of speech emotion recognition has advanced greatly over recent years [4–7]. In particular, it has become possible to infer the user's emotional state from his/her speech thanks to acoustic and prosodic feature sets and models correlating to various emotions [8,9]. OpenSMILE [10], an audio analysis toolkit is commonly used for extracting and analyzing emotion-related features from speech. There are two major types of model that have shown success in describing emotion. The first, discrete models, classifies emotions into basic or fundamental emotions. The best known discrete model is the ‘Big Six’, which covers anger, disgust, fear, joy, sadness, and surprise [11]. The second type, dimensional models, maps emotional states in a low-dimensional space, usually with two or three dimensions. The two most widely used dimensions are arousal and valence, and the third dimension refers to dominance [12]. To automate the emotion recognition, researchers have applied various techniques ranging from conventional machine learning methods such as linear regression [13] and support vector machines (SVMs) [14] to state-of-the-art deep neural networks such as convolutional neural networks [15] and long short-term memory networks [16]. However, even with these rapid developments, speech emotion recognition remains a challenging task because of the variability in speech. Some previous research have tried to use extra information sources to resolve this issue. For example, the lexical features were sometimes combined to improve the performance [17,18]. In this work, we follow this path.
Speech emotion recognition based on hierarchical attributes using feature nets
Published in International Journal of Parallel, Emergent and Distributed Systems, 2020
Huijuan Zhao, Ning Ye, Ruchuan Wang
In addition to the corpora, feature extraction and feature selection are also the important factors that affect machine learning. There are different kinds of methods to extract the feature from the speech signal. According to the format of the input signal, waveform, spectrum or spectrogram can be used as the input to extract features. And we use the OPENSMILE toolbox [31] to extract the features from audio signals. OPENSMILE is an open source audio signal analysis, processing and classification tool. The tool has been widely used in speech recognition, speaker recognition and speech emotion recognition [11,12,32].