Speech processing

Explore chapters and articles related to this topic

Robust Speech Processing as an Inverse Problem

Published in Vijay K. Madisetti, The Digital Signal Processing Handbook, 2017

This section addresses the inverse problem in robust speech processing. A problem that speaker and speech recognition systems regularly encounter in the commercialized applications is the dramatic degradation of performance due to the mismatch of the training and operating environments. The mismatch generally results from the diversity of the operating environments. For applications over the telephone network, the operating environments may vary from offices and laboratories to household places and airports. The problem becomes worse when speech is transmitted over the wireless network. Here the system experiences cross-channel interferences in addition to the channel and noise degradations that exist in the regular telephone network. The key issue in robust speech processing is to obtain good performance regardless of the mismatch in the environmental conditions. The inverse problem in this sense refers to the process of modeling the mismatch in the form of a transformation and resolving it via an inverse transformation. In this section, we introduce the method of modeling the mismatch as an affine transformation.

Hardware chip performance analysis of different FFT architecture

View Article

Journal Information

Published in International Journal of Electronics, 2021

Amit Kumar, Adesh Kumar, Aakanksha Devrari

The discrete Fourier transform (DFT) is a widely used tool in several applications of digital signal processing (DSP) systems. It has a vital role in several applications such as signal analysis, speech processing, image processing, audio processing, video processing, communication systems and many others. DFT is a Fourier representation of signal over finite length sequence. The DFT is achieved by decomposing the valued sequences into different frequency components. It converts time domain signal to a frequency domain signal for the same length while IDFT converts frequency domain signal to time domain signal. The FFT (Cooley & Tukey, 1965) is an algorithm used to compute ‘N’ point DFT of a sequence while inverse FFT is used to compute IDFT. The FFT computes fast by factorising the DFT matrix into a product of sparse factors, mostly zero. An FFT can easily reduce the complexity of DFT hardware. The brute-force calculation of ‘N’ length DFT requires O(N2) multiplications, whereas FFT can reduce the complexity from O(N2) to O(N log2N) for a DFT of length ‘N’. The general equation of DFT for input sequence x(n) over a length ‘N’ is given by

Automatic Speech Recognition Using Limited Vocabulary: A Survey

View Article

Journal Information

Published in Applied Artificial Intelligence, 2022

Jean Louis K. E Fendji, Diane C. M. Tala, Blaise O. Yenke, Marcellin Atemkeng

Automatic speech recognition (ASR) is the process and the related technology applied to convert a speech signal into the matching sequence of words or other linguistic entities using algorithms implemented in computing devices (Indurkhya and Damerau 2010). ASR has become an exciting field for many researchers. Presently, users prefer to use devices such as computers, smartphones, or any other connected device through speech. Current speech processing techniques (encompassing speech synthesis, speech processing, speaker identification or verification) pave the way to create human-to-machine voice interfaces. ASR can be applied in several applications including voice services (Yadava and Jayanna 2017), program control and data entry (Hauser, Sabir, and Thoma 1999), avionics (Noyes and Starr 2007), disabled assistance (Mayer 2018), amongst others. Although ASR can be advantageous in easing human-to-machine communication; in many cases, it is goes beyond helpful and becomes absolutely necessary. For example, low-literacy levels and the extinction of under-resourced languages are ideal candidates for ASR.(Besacier et al. 2014). In fact, the high penetration of communication tools such as smartphones in the developing world (Albabtain et al. 2014) and their increasing presence in rural areas (Ebongue 2015; Ebongue Louis 2015) provides an unprecedented opportunity to develop a voice-based application that can help to mitigate the low literacy levels in those areas. Smartphones offer many advantages over a PC-based interface, such as high mobility and portability, easy recharge of their batteries, and conventional embedded features such as microphones and speakers.

A Review on Applications of Artificial Intelligence Over Indian Legal System

View Article

Journal Information

Published in IETE Journal of Research, 2021

Riya Sil, Abhishek Roy

In this section, the authors have discussed the sub-domains of AI: Computer Vision – It is a subset of AI that provides a visual experience to computers or machines to analyze events/actions and actors through images [12].Evolutionary Computation – It is a sub-domain of soft-computing and AI. It consists of several algorithms used for global optimization. It has features of stochastic optimization based on error problem solvers and population-based experiments. It includes programming and a Genetic Algorithm [13].Expert System – It is a computer-based system that imitates the capability of human decision-making. It is used to solve complex problems using reasoning instead of conventional procedural code. For instance, teaching systems and decision making [14].Machine Learning – It is a sub-domain of AI. Using Machine Learning, machines can gain advanced knowledge automatically from past experience without explicit programming. Space Learning and Decision Tree Learning are some of the significant examples [15].Natural Language Processing – It is a subfield of AI that is primarily based on natural language and computer connections. Natural Language Processing analyses an enormous amount of natural language data through a program such as machine translation [16].Neural Network – It is categorized using neuron connecting path and adaptive weights which can be tuned using a learning algorithm. This knowledge is gained for the advancement of the model from observed data. Time series prediction, brain modeling, and classification are a few of the examples [17].Planning – It is a decision-making process used for the performance of a specific task with the use of programs or machines. It is basically about deciding a series of actions and aiming to complete it. Game playing and scheduling are some of the prominent examples [18].Robotic artificial agent – The main objective of robotics artificial agent is to influence objects by demolishing, shifting, recognizing, and selecting it. Autonomous exploration and intelligent control are some of the prominent examples [19].Speech Processing – It is the process by which a machine or program is able to recognize and translate any word or phrase from any verbal communication into a machine-readable layout. Speech production and recognition are some examples [20].

Explore chapters and articles related to this topic

Robust Speech Processing as an Inverse Problem

Further Reading

Hardware chip performance analysis of different FFT architecture

Automatic Speech Recognition Using Limited Vocabulary: A Survey

A Review on Applications of Artificial Intelligence Over Indian Legal System