Speaker recognition

Explore chapters and articles related to this topic

Quantization of Discrete Time Signals

Published in Vijay K. Madisetti, The Digital Signal Processing Handbook, 2017

Speaker recognition is the task of identifying a speaker by his or her voice. Systems performing speaker recognition operate in different modes. A closed set mode is the situation of identifying a particular speaker as one in a finite set of reference speakers [17]. In an open set system, a speaker is either identified as belonging to a finite set or is deemed not to be a member of the set [17]. For speaker verification, the claim of a speaker to be one in a finite set is either accepted or rejected [18]. Speaker recognition can either be done as a text-dependent or text-independent task. The difference is that in the former case, the speaker is constrained as to what must be said, while in the latter case no constraints are imposed. In this chapter, we focus on the closed set, text-independent mode. The overall system will have three components, namely, (1) LP analysis for parameterizing the spectral envelope, (2) feature extraction for ensuring speaker discrimination, and (3) classifier for making a decision. The input to the system will be a speech signal. The output will be a decision regarding the identity of the speaker.

View Chapter

Purchase Book

Published in Sadaoki Furui, Digital Speech Processing, Synthesis, and Recognition, 2018

Sadaoki Furui

Speaker recognition can be principally divided into speaker verification and speaker identification. Speaker verification is the process of accepting or rejecting the identity claim of a speaker by comparing a set of measurements of the speaker’s utterances with a reference set of measurements of the utterance of the person whose identity is being claimed. Speaker identification is the process of determining from which of the registered speakers a given utterance comes. The speaker identification process is similar to the spoken word recognition process in that both determine which reference template is most similar to the input speech.

Applications of Artificial Neural Networks (ANNs) to Speech Processing

View Chapter

Purchase Book

Published in Yu Hen Hu, Jenq-Neng Hwang, Handbook of Neural Network Signal Processing, 2018

Shigeru Katagiri

There are two types of speaker recognition tasks: (1) speaker identification, which is the process of identifying an unknown speaker from a known population, and (2) speaker verification, which is the process of verifying the identity of a claimed speaker from a known population. Given a test utterance X, the discriminant function gk(X; Λ) is defined as a function that measures the likelihood of observing X being generated by speaker k, where Λ is a set of trainable recognizer parameters.

Neural architectures for gender detection and speaker identification

View Article

Journal Information

Published in Cogent Engineering, 2020

Orken Mamyrbayev, Alymzhan Toleu, Gulmira Tolegen, Nurbapa Mekebayev

Automatically detecting gender and identifying speakers through a speaker’s voice is an important task in the audio signal processing area. Gender detection deals with finding out whether a speech spoken by a male or a female. This task is very crucial for gender-dependent automatic speech recognition (ASR), which let the ASR system be more accurate than gender-independent systems. Speaker recognition is the process of automatically recognizing the speakers on the basis of individual information carried in the speech wave, which can be categorized into speaker identification and speaker verification. Speaker verification is the process of accepting or rejecting the identity claim of a speaker. Speaker identification, on the other hand, is the process of determining which registered speakers provide the input speech.

Speaker Verification from Short Utterance Perspective: A Review

View Article

Journal Information

Published in IETE Technical Review, 2018

Rohan Kumar Das, S. R. Mahadeva Prasanna

Human beings can be recognized using speech as a biometric feature as each speaker has different style of speech delivery, vocabulary usage apart from the physiological structure of their speech production system. The physiological structure of each speaker includes shape and size of the vocal tract, size of the larynx, etc. This causes difference between the speakers in speech production. Speaker modelling is essential for many tasks, which include speaker recognition, speaker diarization, speaker change detection, speaker clustering, etc. Speaker recognition refers to recognizing a person based on voice samples of that particular person. On the other hand, speaker diarization deals with finding who spoke when, which is useful to find the speech of a particular speaker from a conversation of multiple speakers. Speaker change detection refers to the task of finding the region where the change of speaker occurs for a speech containing multiple speakers. Similarly, speaker clustering groups a set of speakers on a similarity basis as per the requirement.