Explore chapters and articles related to this topic
Deep Face Recognition
Published in Hassan Ugail, Deep Learning in Visual Computing, 2022
The FaceNet model is a pre-trained deep learning architecture inspired by GoogLeNet models for efficient face recognition. For the task of face recognition, it uses an impression of a list of people in a dataset along with data from a new person or people to be recognised. A key element of the FaceNet architecture is the generation of an embedding of a given dimension from a face image of a predefined size. The input image is fed through a deep CNN architecture which has a fully connected layer at the end. This results in an embedding of 128 features that may or may not be visually understandable to a human. Then, for the recognition task, the network can calculate the distance between the individual features of each of the embeddings. Metrics such as the squared error or the absolute error can be utilised to compute the distance between the embedding.
Intelligent Systems
Published in Puneet Kumar, Vinod Kumar Jain, Dharminder Kumar, Artificial Intelligence and Global Society, 2021
Satyajee Srivastava, Abhishek Singh, Deepak Dudeja
FaceNet is one of the finest deep convolutional networks designed by Google, trained to resolve face verification, recognition, and agglomeration drawback expeditiously at scale.FaceNet tries to directly learn mapping from images of faces to a compact Euclidean space. Here, distance measures correspond to measure of similarity of faces [20] (i.e. if we have an image X and we want to measure its similarity with images Y and Z — if remaining pre-processing steps have been done correctly — then, we can measure the distance between images X-Y and X-Z. The less Euclidean distance between these two pairs of images, the more similar they are [21].It optimizes the embedding face recognition performance by using only 128-bytes per face [20].
Real-Time Identity Censorship of Videos to Enable Live Telecast Using NVIDIA Jetson Nano
Published in Mohamed Lahby, Utku Kose, Akash Kumar Bhoi, Explainable Artificial Intelligence for Smart Cities, 2021
Shree Charran R., Rahul Kumar Dubey
The learning objective of MTCNN is a multi-task loss, with one loss as binomial cross-entropy loss is the probability that the box has a face, the second one is Euclidean distance loss for bounding box regression, and third one is Euclidean loss for facial landmark regression. The 3 losses are weighted and summed up in a cumulative multi-task formula. The output bounding box with the face from MTCNN is fed to FaceNet for face recognition in the bounding box. FaceNet is a face recognition model by Google. The FaceNet system can be used to extract high-quality features from faces, called face Embedding’s that can then be used to train a face identification system. Finally, an SVM classifier is used to identify the face in the last stage. The face Embedding’s are multidimensional numerical vector representations of a face which represent the unique identity of the face. Face net provides a 128 dimension embedding for a face. The triplet loss involves comparing face Embedding’s for three images, one being an anchor (reference) image, second, a positive image (matching the anchor), and third, a negative image (not matching the anchor). Embedding’s are learnt by a Deep CNN network such that the positive embedding is closer to anchor embedding compared to negative embedding distance to the anchor F(A)−F(P)+margin<F(A)−F(N) where
FERCE: Facial Expression Recognition for Combined Emotions Using FERCE Algorithm
Published in IETE Journal of Research, 2022
A. Swaminathan, A. Vadivel, Michael Arock
In this work, PCA has opted for the data reduction process, in which the feature points of 18 newly derived classes are reduced. The reduced data variables are given for SVM training for classification purpose. The classification accuracy of basic emotions and for the newly derived emotions are shown in Figures 7 and 8. The same data is trained and tested using deep learning algorithms such as: AlexNet [55] and FaceNet [56]. Convolutional Neural Network (CNN) model is used in both AlexNet and FaceNet algorithm, which consists of multilayers of convolutions, max-pooling, dropout, Rectified Linear Unit (ReLU), SGD and fully connected layers. The results of basic, compound and combined emotions using AlexNet and FaceNet algorithm are presented in Table 5. The results are not appreciable because, the architecture of FaceNet, inceptionv3 and GoogleNet are designed to perform face recognition, face re-identification, object detection and various image classification. Thus the standard architectures are modeled differently w.r.t. purpose. But, our proposed work is concerned with detecting combined emotions which involves identification of minor changes in facial expressions.
RFR-DLVT: a hybrid method for real-time face recognition using deep learning and visual tracking
Published in Enterprise Information Systems, 2020
Zhenfeng Lei, Xiaoying Zhang, Shuangyuan Yang, Zihan Ren, Olusegun F. Akindipe
This experiment compared the accuracy of RFR-DLVT proposed in this paper with Deep Face (Taigman, Yang, and Ranzato et al. 2014), DeepID2 (Henriques, Rui, and Martins et al. 2014), FaceNet (Schroff, Kalenichenko, and Philbin 2015) and Lightened CNN (Wu, He, and Sun 2015), L-Softmax CNN (Weiyang et al. 2016) and (Anh et al. 2017) on the LFW, YTF and Surveillance video datasets. The results are shown in Table 5 . The accuracy of RFR-DLVT proposed in this paper is 99.48% on the LFW dataset, 94.2% on the YTF dataset, and 99.6% on the surveillance video dataset. The accuracy of RFR-DLVT on those public datasets exceeds most existing FR algorithms, which is highly competitive. Although FaceNet has better accuracy than ours, it is trained on a super large datatsets. Our proposed model only used five million images to train our network, and finally achieved those good results.