Explore chapters and articles related to this topic
Deep Learning-Based Object Recognition and Detection Model
Published in Krishna Kant Singh, Vibhav Kumar Sachan, Akansha Singh, Sanjeevikumar Padmanaban, Deep Learning in Visual Computing and Signal Processing, 2023
Aman Jatain, Khushboo Tripathi, Shalini Bhaskar Bajaj
HOG:3 HOG is one of the feature descriptor used to detect objects. This concept gained the worldwide attention in 2005 when N. Dalal and B. Triggs presented it at the Conference on Computer Vision and Pattern Recognition (CVPR). The foremost purpose of the HOG detector was to focus on pedestrian detection in static images. The HOG detector adjusts the size of the input images which come in the contrasting sizes. On the other hand, the size of the detection window remained uninterrupted. The distribution of intensity gradients or edge directions describes local object appearance and shape within an image. The image is divided into small connected regions called cells. The histogram of gradient compiled directions of the pixels within each cell. The list of algorithms implemented in HOG detector is: gradient computation, orientation binning, descriptor blocks, block normalization, object recognition.
Vision-based defects detection for bridges using transfer learning and convolutional neural networks
Published in Structure and Infrastructure Engineering, 2020
Jinsong Zhu, Chi Zhang, Haidong Qi, Ziyue Lu
HOG is a feature description operator used in computer vision and image processing to conduct object detection. In an image, the appearance and shape of local object can be well described by gradient or edge direction density distribution, and HOG is formed by computing gradient direction histogram of local image regions. The combined utilization of HOG and SVM has been widely used in image recognition and achieved great success. HOG + SVM fusion method for pedestrian detection was proposed at CVPR (IEEE Conference on Computer Vision and Pattern Recognition) in 2005 (Dalal & Triggs, 2005). Before the AlexNet was proposed, majority of detection and classification models draw on above concepts. The calculation method of HOG is divided into several steps: Firstly, a gradient operator is used to complete the segmentation of the image. Second, convolution is performed on the patch to calculate the gradient direction and amplitude at each pixel point, and specific formulas are shown in Equations (5) and (6) respectively: where Ix and Iy indicate gradient values in the horizontal and vertical directions, M(x,y) represents the magnitude of the gradient, and (x,y) represents the direction of the gradient. Thirdly, a circle needs to be divided into several parts, such as 12 bins, which means each bin contains 30 degrees. Fourthly, according to the gradient direction of each pixel, bilinear interpolation is used to add its amplitude to the histogram.(2) GLCM
The role of machine intelligence in photogrammetric 3D modeling – an overview and perspectives
Published in International Journal of Digital Earth, 2021
The recent prevalence of machine learning (interchangeable with the term ‘artificial intelligence’, but more to the point) has shown a great potential in addressing complex tasks with impressive performance, thus attracting attention in the field of photogrammetry and in particular in computer vision (Hinton and Salakhutdinov, 2006; LeCun et al., 2015). Although neither AI itself, nor the involvement of AI in the field is new, while the recent rise of their development have encouraged us to revisit their role in the field of photogrammetry, as well as their already active role in computer vision (Goodfellow et al., 2016; Szegedy et al., 2016). A very recent study in nature neuroscience (Bonnen et al., 2020) indicated that the binocular viewing geometry evidentially shape the human neural representation and therefore there is a great potential to utilize 3D modeling techniques to enhance ‘AI’. There are a plethora of existing works that apply machine learning for solving spatially related issues, and the recent top tier computer vision conferences (e.g. CVPR (IEEE Conference on Computer Vision and Pattern Recognition), ICCV (IEEE international conference on Computer Vision), and ECCV (European Conference ion Computer Vision), etc.) are filled with machine learning (deep learning in particular) based works, with some of them relevant to photogrammetry and remote sensing (Lu et al., 2020; Robinson et al., 2019) (Blaha et al., 2016; Ozcanli et al., 2016; Treible et al., 2018). In this article, we provide a general overview of works that use machine learning and address critical components of the photogrammetric data processing pipeline, including (1) data acquisition; (2) geo-referencing; (3) Digital Surface Model generation; (4) semantic interpretation. Examples are shown in Figure 1.
Sliding window based deep ensemble system for breast cancer classification
Published in Journal of Medical Engineering & Technology, 2021
Amin Alqudah, Ali Mohammad Alqudah
ResNet-50 was proposed by Kaiming He et al. in 2016 at the 2016 IEEE Conference on Computer Vision and Pattern Recognition [36]. The main motivation for ResNet is to provide a short form for Residual Networks. An example of a short network is shown in Figure 5 below. Since the start and development of machine learning over recent years, the convolutional neural networks (CNN) have made a group of breakthroughs for image classification. Although CNNs are very difficult to train, they are able to solve many complex tasks and enhance the classification accuracy [36].