Spoken dialog systems – Knowledge and References

Explore chapters and articles related to this topic

Interactive Computing

Published in Vivek Kale, Digital Transformation of Enterprise Architecture, 2019

Traditional graphical interfaces which might not be appropriate for all users and/or applications; for this reason, spoken dialog systems are becoming a strong alternative. Speech and natural language technologies allow users to communicate in a flexible and efficient way, while also enabling the access to applications when traditional input and output interfaces cannot be used (e.g., in-car applications, access for disabled persons, etc.). Also, speech-based interfaces work seamlessly with small devices (e.g., smart phones, tablets, and PCs) and allow users to easily invoke local applications or access remote information (Griol et al. 2017).

EPMS for Customer Conversations

View Chapter

Purchase Book

Published in Vivek Kale, Enterprise Process Management Systems, 2018

Vivek Kale

Spoken dialog systems are computer programs that receive speech as input and generate information as output synthesized speech, engaging the user in a dialog that aims to be similar to that between humans: Automatic speech recognition (ASR): The goal of speech recognition is to obtain the sequence of words uttered by a speaker. Once the speech recognizer has provided an output, the system must understand what the user said.Spoken language understanding module: The goal of spoken language understanding is to obtain the semantics from the recognized sentence. This process generally requires morphological, lexical, syntactical, semantic, discourse, and pragmatic knowledge.Dialog manager module: The dialog manager decides the next action of the system, interpreting the incoming semantic representation of the user input in the context of the dialog. In addition, it resolves ellipsis and anaphora, evaluates the relevance and completeness of user requests, identifies and recovers from recognition and understanding errors, retrieves information from data repositories, and decides about the system’s response.Natural language generation (NLG) module: Natural language generation is the process of obtaining sentences in natural language from the nonlinguistic, internal representation of information handled by the dialog system.Text-to-speech module: Text-to-speech transforms the generated sentences into synthesized speech.

Human-centric artificial intelligence architecture for industry 5.0 applications

View Article

Journal Information

Published in International Journal of Production Research, 2023

Jože M. Rožanec, Inna Novalija, Patrik Zajec, Klemen Kenda, Hooman Tavakoli Ghinani, Sungho Suh, Entso Veliou, Dimitrios Papamartzivanos, Thanassis Giannetsos, Sofia Anna Menesidou, Ruben Alonso, Nino Cauli, Antonello Meloni, Diego Reforgiato Recupero, Dimosthenis Kyriazis, Georgios Sofianidis, Spyros Theodoropoulos, Blaž Fortuna, Dunja Mladenić, John Soldatos

Spoken dialog systems and conversational multimodal interfaces leverage artificial intelligence and can reduce friction and enhance human-machine interactions (Klopfenstein et al. 2017; Vajpai and Bora 2016; Maurtua et al. 2017) by approximating a human conversation. However, in practice, conversational interfaces mostly act as the first level of support and cannot offer much help as a knowledgeable human. They can be classified into three broad categories: (i) basic-bots, (ii) text-based assistants, and (iii) voice-based assistants. While basic bots have a simple design and allow basic commands, the text-based assistants (also known as chatbots) can interpret users' text and enable more complex interactions. Both cases require speech-to-text and text-to-speech technologies, especially if verbal interaction with the conversational interface is supported. Many tools have been developed to support the aforementioned functionalities. Among them, we find the Web Speech API,1 which can be configured to recognise expressions based on a finite set of options defined through a grammar.2 Most advanced version of conversational interfaces are represented by voice assistants, such as the Google Assistant,3 Apple's Siri,4 Microsoft's Cortana,5 or Amazon's Alexa.6 They can be integrated into multiple devices and environments through publicly available application development interfaces (APIs), enabling new business opportunities (Erol et al. 2018). Given voice interfaces can place unnecessary constraints in some use cases, they can be complemented following a multimodal approach (Kouroupetroglou et al. 2017).