Explore chapters and articles related to this topic
Robust Speech Processing as an Inverse Problem
Published in Vijay K. Madisetti, The Digital Signal Processing Handbook, 2017
Richard J. Mammone, Xiaoyu Zhang
This section addresses the inverse problem in robust speech processing. A problem that speaker and speech recognition systems regularly encounter in the commercialized applications is the dramatic degradation of performance due to the mismatch of the training and operating environments. The mismatch generally results from the diversity of the operating environments. For applications over the telephone network, the operating environments may vary from offices and laboratories to household places and airports. The problem becomes worse when speech is transmitted over the wireless network. Here the system experiences cross-channel interferences in addition to the channel and noise degradations that exist in the regular telephone network. The key issue in robust speech processing is to obtain good performance regardless of the mismatch in the environmental conditions. The inverse problem in this sense refers to the process of modeling the mismatch in the form of a transformation and resolving it via an inverse transformation. In this section, we introduce the method of modeling the mismatch as an affine transformation.
Further Reading
Published in John Holmes, Wendy Holmes, Speech Synthesis and Recognition, 2002
These days nearly all speech processing is digital, and a grasp of basic concepts in digital signal processing is necessary in order to obtain a good understanding of the processes involved in generating and analysing speech signals. Good general introductory textbooks on digital signal processing include Lynn and Fuerst (second edition, 1998) and Lyons (1997). One of the most important textbooks on digital processing of speech is Rabiner and Schafer (1978), but even in its early chapters this book requires the reader to be really at home with mathematical notation and manipulation. For the less mathematically minded, Harrington and Cassidy (1999) or Rosen and Howell (second edition, 2001) provide a much easier introduction, though they are less comprehensive than Rabiner and Schafer.
Hardware chip performance analysis of different FFT architecture
Published in International Journal of Electronics, 2021
Amit Kumar, Adesh Kumar, Aakanksha Devrari
The discrete Fourier transform (DFT) is a widely used tool in several applications of digital signal processing (DSP) systems. It has a vital role in several applications such as signal analysis, speech processing, image processing, audio processing, video processing, communication systems and many others. DFT is a Fourier representation of signal over finite length sequence. The DFT is achieved by decomposing the valued sequences into different frequency components. It converts time domain signal to a frequency domain signal for the same length while IDFT converts frequency domain signal to time domain signal. The FFT (Cooley & Tukey, 1965) is an algorithm used to compute ‘N’ point DFT of a sequence while inverse FFT is used to compute IDFT. The FFT computes fast by factorising the DFT matrix into a product of sparse factors, mostly zero. An FFT can easily reduce the complexity of DFT hardware. The brute-force calculation of ‘N’ length DFT requires O(N2) multiplications, whereas FFT can reduce the complexity from O(N2) to O(N log2N) for a DFT of length ‘N’. The general equation of DFT for input sequence x(n) over a length ‘N’ is given by
Automatic Speech Recognition Using Limited Vocabulary: A Survey
Published in Applied Artificial Intelligence, 2022
Jean Louis K. E Fendji, Diane C. M. Tala, Blaise O. Yenke, Marcellin Atemkeng
Automatic speech recognition (ASR) is the process and the related technology applied to convert a speech signal into the matching sequence of words or other linguistic entities using algorithms implemented in computing devices (Indurkhya and Damerau 2010). ASR has become an exciting field for many researchers. Presently, users prefer to use devices such as computers, smartphones, or any other connected device through speech. Current speech processing techniques (encompassing speech synthesis, speech processing, speaker identification or verification) pave the way to create human-to-machine voice interfaces. ASR can be applied in several applications including voice services (Yadava and Jayanna 2017), program control and data entry (Hauser, Sabir, and Thoma 1999), avionics (Noyes and Starr 2007), disabled assistance (Mayer 2018), amongst others. Although ASR can be advantageous in easing human-to-machine communication; in many cases, it is goes beyond helpful and becomes absolutely necessary. For example, low-literacy levels and the extinction of under-resourced languages are ideal candidates for ASR.(Besacier et al. 2014). In fact, the high penetration of communication tools such as smartphones in the developing world (Albabtain et al. 2014) and their increasing presence in rural areas (Ebongue 2015; Ebongue Louis 2015) provides an unprecedented opportunity to develop a voice-based application that can help to mitigate the low literacy levels in those areas. Smartphones offer many advantages over a PC-based interface, such as high mobility and portability, easy recharge of their batteries, and conventional embedded features such as microphones and speakers.
A Review on Applications of Artificial Intelligence Over Indian Legal System
Published in IETE Journal of Research, 2021
In this section, the authors have discussed the sub-domains of AI: Computer Vision – It is a subset of AI that provides a visual experience to computers or machines to analyze events/actions and actors through images [12].Evolutionary Computation – It is a sub-domain of soft-computing and AI. It consists of several algorithms used for global optimization. It has features of stochastic optimization based on error problem solvers and population-based experiments. It includes programming and a Genetic Algorithm [13].Expert System – It is a computer-based system that imitates the capability of human decision-making. It is used to solve complex problems using reasoning instead of conventional procedural code. For instance, teaching systems and decision making [14].Machine Learning – It is a sub-domain of AI. Using Machine Learning, machines can gain advanced knowledge automatically from past experience without explicit programming. Space Learning and Decision Tree Learning are some of the significant examples [15].Natural Language Processing – It is a subfield of AI that is primarily based on natural language and computer connections. Natural Language Processing analyses an enormous amount of natural language data through a program such as machine translation [16].Neural Network – It is categorized using neuron connecting path and adaptive weights which can be tuned using a learning algorithm. This knowledge is gained for the advancement of the model from observed data. Time series prediction, brain modeling, and classification are a few of the examples [17].Planning – It is a decision-making process used for the performance of a specific task with the use of programs or machines. It is basically about deciding a series of actions and aiming to complete it. Game playing and scheduling are some of the prominent examples [18].Robotic artificial agent – The main objective of robotics artificial agent is to influence objects by demolishing, shifting, recognizing, and selecting it. Autonomous exploration and intelligent control are some of the prominent examples [19].Speech Processing – It is the process by which a machine or program is able to recognize and translate any word or phrase from any verbal communication into a machine-readable layout. Speech production and recognition are some examples [20].