Visual descriptors – Knowledge and References

Explore chapters and articles related to this topic

AI for Advanced Driver Assistance Systems

Published in Josep Aulinas, Hanky Sjafrie, AI for Cars, 2021

Gaze estimation algorithms can be categorized into two general approaches: geometry-based and appearance-based. The geometry-based, also known as the model-based approach, estimates the point of gaze using geometric calculation based on a constructed 3D model of the face or the eyes. Depending on the algorithm used, the model might be constructed using the estimated head pose usually in conjunction with the detected or estimated position of relevant facial “landmarks” such as pupil centers, eyeball centers or mouth. The appearance-based method uses machine learning to determine the point of gaze based on direct eye or face images as input. Visual descriptors, for instance, local binary patterns (LBP) or multi-scale histogram of oriented gradients (mHOG), are extracted from each image and processed through a machine learning’s regressor (for numerical output) or classifier (for categorical output) algorithm. One big advantage of the appearance-based method is that it is less demanding than its geometry-based counterpart, since the appearance-based method does not require high-resolution input images to perform the gaze estimation. State-of-the-art gaze estimation employs Convolutional Neural Network (CNN) and other deep-learning methods either to replace the above (manual) feature extraction step or to act as an end-to-end system. The latter takes direct eye or face images as input, and then outputs directly the estimated gaze direction without any human pre-defined intermediate step such as the aforementioned feature extraction.

Image Retrieval

View Chapter

Purchase Book

Published in Vipin Tyagi, Understanding Digital Image Processing, 2018

Vipin Tyagi

Feature extraction is a process that extracts the visual information of the image, i.e., color, texture, shape and edges. These visual descriptors define the contents of the image. Extraction of the features is performed on the pre-processed image. Image features are calculated according to the requirement. These features can be extracted either in spatial domain or in the spectral domain. To optimise performance, image retrieval feature extraction is performed in order to find the most prominent signature. This unique signature is also termed as a feature vector of the image. These features are extracted on the basis of the pixel values. These extracted features describe the content of the image. Extraction of visual features is considered as being the most significant step because the features that are used for discrimination directly influence the effectiveness of the whole image retrieval system.

A low-resolution real-time face recognition using extreme learning machine and its variants

View Article

Journal Information

Published in The Imaging Science Journal, 2023

Ankit Rajpal, Khushwant Sehra, Anurag Mishra, Girija Chetty

The vast volume of digital media available on the Internet is often characterized by certain visual descriptors that give information regarding the shape, size, colour, and other distinguishing elements [55]. Histogram of Oriented Gradients (HOG) is another type of visual descriptor proposed by Dalal et al. [56] in 2005, and analyzes the input image for information regarding different shapes as a distinguishing feature. HOG finds wide application in object/pattern recognition problems [56], as it can identify and extract critical information even from images with high noise content. Due to these attributes, the histogram of oriented gradients (HOG) has often been appreciated for its suitability for handling such problems, and therefore its suitability in the face recognition domain thus follows naturally. The HOG-based feature extraction technique focuses on identifying and analyzing the information stored in the local regions of a target image, particularly the regions where edges are encountered [56]. HOG has been used extensively for object detection applications [55, 57], even in monochrome images. It works on the local regions of an image by defining it in terms of groups of local histograms. This involves dividing an image into smaller cells of optimum dimensions of size . These cells form the basis of further evaluation, as these are used for computing the gradient vectors for each pixel. The gradient vectors represent the direction of the rate of maximum change, i.e. increment in the functional value. After the computation of all the gradient vectors corresponding to their respective cells, their magnitudes are binned into a histogram based on a vector angle. An inherent property of gradient vectors is that HOG is found to be immune to contrast changes in the source image [58]. It, therefore, finds applications in various face recognition schemes to tackle the consequence of varying lighting conditions. The blocks are realized from a group of cells, as depicted in Figure 3, after calculating histograms for each corresponding cell. This is known as local contrast normalization [56].