Speech recognition – Knowledge and References

Explore chapters and articles related to this topic

Artificial Intelligence-Based Assistive Technology

Published in P. Kaliraj, T. Devi, Artificial Intelligence Theory, Models, and Applications, 2021

Leveraging powerful new speech recognition technology supports spoken communication and increases independence for people with a speech disability. Voiceitt is a device developed for automatic speech recognition technology that understands non-standard speech patterns, real-time communication to provide the individual with a perfect recognizer. This integrates customizable speech recognition with mainstream voice technologies and devices enables environmental control through a universally assessable voice system. Some of the apps available are ACT Lab a storytelling app from images that improve communication, a visualizer that rhythms the music with a beat, an EVE that recognizes speech that generates captions, CAIR to recognize the words in spoken messages, Timlogo improves access to speech therapy, ReadRing, real-time converter of text-to-braille, and Helpicto that converts speech to pictograms.

Tools for Sensor-Based Performance Assessment and Hands-Free Control

View Chapter

Purchase Book

Published in Jack M. Winters, Molly Follette Story, Medical Instrumentation, 2006

Gerald E. Miller

Speech recognition can be designed to be either speaker independent or speaker dependent. The speaker-independent systems are usually smaller-vocabulary systems or systems with embedded tree-based vocabularies, with each branch having limited vocabularies. In addition, a speaker-independent system should consist of words that are phonetically different, as there are already differences in speaker pitch, loudness, and tone that must be accounted for. An example of a speaker-independent system is an airline automated reservation system, which is based upon simple answers such as numbers between 0 and 100 as well as “yes” and “no.” Many embedded speech recognition systems are also speaker independent as they are based on a limited vocabulary. One example would be a design to control a motorized wheelchair, in which the controlling words would consist of “left,” “right,” “stop,” “go,” and “back.”

Developing interactive speech technology

View Chapter

Purchase Book

Published in Christopher Baber, Janet M. Noyes, Interactive Speech Technology: Human factors issues in the application of speech input/output to computers, 2002

C. Baber

Speech recognition devices are generally categorized into two types: speaker-dependent and speaker-independent. The difference concerns whether the speech utterances of the user are known to the system. Speaker-dependent devices require the intended user to provide samples of the words to be used - a process known as enrolment, while speaker-independent systems in theory can recognize the utterances of any user, whether or not known to the system. Enrolment is achieved by repeating the words in the vocabulary a number of times, although some manufacturers of commercially available devices state that their product only requires single pass enrolment, i.e. one utterance of each word. However, it is generally thought that a single utterance will not capture sufficient of the variation in speech to provide adequate information (Baber, 1991a). Baber concluded that in order to trade off information content and time, 3-5 utterances would be sufficient for each word. Set-up times could therefore be quite long, as a vocabulary of only 50 words would require up to 250 spoken samples to be provided. The alternative approach is to use speaker-independent technology, which does not require enrolment as the vocabularies are pre-stored.

Automatic Speech Recognition Using Limited Vocabulary: A Survey

View Article

Journal Information

Published in Applied Artificial Intelligence, 2022

Jean Louis K. E Fendji, Diane C. M. Tala, Blaise O. Yenke, Marcellin Atemkeng

A speech recognition system can be vulnerable to a noisy environment. To address this issue, deep reinforcement learning (DRL) can achieve complex goals in an iterative manner, which makes it suitable for such applications. Reinforcement learning is a popular paradigm of ML, which involves agents learning their behavior by trial and error. DRL is a combination of standard reinforcement learning with DL to overcome the limitations of reinforcement learning in complex environments with large state spaces or high computation requirements. DRL enables software-defined agents to learn the best actions possible in virtual environments to attain their goals (Mnih, Kavukcuoglu, and Silver 2020). This technique has recently been applied to limited vocabulary such as the “Speech Command” dataset in (Rajapakshe et al. 2020) or larger vocabulary such as (Kala and Shinozaki 2018). Regardless of the artificial intelligence technique that is used, an important prerequisite remains, namely the dataset.

Design of a speech-enabled 3D marine compass simulation system

View Article

Journal Information

Published in Ships and Offshore Structures, 2018

Bin Fu, Hongxiang Ren, Jingjing Liu, Xiaoxi Zhang

Speech recognition is a technology that allows a machine to identify, understand and translate human voice signals into the corresponding text, and can be used to implement appropriate control technology (Rao and Paliwal 1986). Compared with traditional human–computer interaction, speech interaction has the advantage of convenience and can be used to implement ‘intelligent’ operations. In recent years, speech recognition technology has been applied to smart home voice control systems, car voice recognition systems and many other areas (Kumar, Suraj, et al. 2014; Pai et al. 2016). Speech interaction is more convenient and intelligent than traditional interactive approaches; however, speech interactions have rarely been applied to navigation.

Recent advances in artificial intelligence for video production system

View Article

Journal Information

Published in Enterprise Information Systems, 2023

YuFeng Huang, ShiJuan Lv, Kuo-Kun Tseng, Pin-Jen Tseng, Xin Xie, Regina Fang-Ying Lin

Jin (2018) presents a speech synthesiser for text-based editing of narrations, utilising voice conversion to transform a robotic voice into a desired human voice. Pre-defined prosody and lack of target speaker’s prosodic characteristics are limitations. Tahon, Lecorve, and Lolive (2018) design an automatic pronunciation generation method from text, improving the quality and expressivity of synthesised speech. Emotion-specific pronunciations are subtle and not easily perceived. Singh and Goel (2021) propose an algorithm using deep learning for high-level feature extraction in speech corpora, achieving better accuracy in emotion recognition. Noise remains a challenge for speech recognition systems.