Explore chapters and articles related to this topic
Enhancing Speaker Diarization for Audio-Only Systems Using Deep Learning
Published in Sam Goundar, Archana Purwar, Ajmer Singh, Applications of Artificial Intelligence, Big Data and Internet of Things in Sustainable Development, 2023
Aishwarya Gupta, Archana Purwar
Speaker diarization is a process of determining “who speaks when” during any audio or video conversation. It is a task of labelling an audio or video recording corresponding to the speaker identity. Earlier it was just a mere step in the process of automatic speech recognition (ASR). Over the years there has been an improvement in accuracy and robustness found for various application of speech recognition also in the field of speaker diarization. Recently speaker diarization came out as a stand-alone domain in itself with its own challenges and approach to deal with it. It has gained popularity over numerous applications in many areas dealing with day-to-day issues and solutions as well. Diarization is all about knowing the speaker count and the part of speech where they have spoken during the complete conversation.
Speaker Verification from Short Utterance Perspective: A Review
Published in IETE Technical Review, 2018
Rohan Kumar Das, S. R. Mahadeva Prasanna
Human beings can be recognized using speech as a biometric feature as each speaker has different style of speech delivery, vocabulary usage apart from the physiological structure of their speech production system. The physiological structure of each speaker includes shape and size of the vocal tract, size of the larynx, etc. This causes difference between the speakers in speech production. Speaker modelling is essential for many tasks, which include speaker recognition, speaker diarization, speaker change detection, speaker clustering, etc. Speaker recognition refers to recognizing a person based on voice samples of that particular person. On the other hand, speaker diarization deals with finding who spoke when, which is useful to find the speech of a particular speaker from a conversation of multiple speakers. Speaker change detection refers to the task of finding the region where the change of speaker occurs for a speech containing multiple speakers. Similarly, speaker clustering groups a set of speakers on a similarity basis as per the requirement.