Feature vectors – Knowledge and References

Explore chapters and articles related to this topic

Machine learning solutions for development of performance deterioration models of flexible airfield pavements

Published in Inge Hoff, Helge Mork, Rabbira Garba Saba, Eleventh International Conference on the Bearing Capacity of Roads, Railways and Airfields, Volume 3, 2022

A.Z. Ashtiani, S. Murrell, R. Speir, D.R. Brill

The goal of an ML model is to increase the learning accuracy while maintaining low variance and avoiding overfitting to noise. The process of building ML models for data-driven applications is an iterative process and has three components: feature engineering, algorithm selection and parameter tuning (Kumar et al. 2016). Feature engineering is the process of converting raw data into sets of feature vectors (input variables) that provide the best prediction of model performance. Reducing the number of variables to a subset of useful features is desired for an effective ML prediction. The problem with high-dimensional features is that having more dimensions increases the difficulty of gauging the influence of each feature on the prediction. In addition, models with high numbers of features relative to the number of data samples tend to be prone to overfitting. Determining the best candidate ML algorithm depends on the size and type of the data (e.g. time dependent) and sparsity of the data (infrequent, missing, irregular data). Parameter tuning is the process of determining the value of the hyper-parameters of ML algorithms. For each ML model there are unique hyper-parameter configurations (e.g. loss functions, search strategies) that affect performance. Hyper-parameters are fine-tuned iteratively to create a trade-off between the performance accuracy (bias) and variance of the ML models.

Unraveling Data Science, Artificial Intelligence, and Autonomy

View Chapter

Purchase Book

Published in Jay Liebowitz, Data Analytics and AI, 2020

John Piorkowski

Unsupervised learning does not rely on labeled data for training. Instead, data are processed by the unsupervised algorithm, which seeks to cluster like data based on various distance measures. “Unsupervised” refers to a learning algorithm that does not assume any prior knowledge of data labels. The key for unsupervised learning is to represent the data into a structure that allows for the application of distance measures. For example, the data will be structured into feature vectors. A feature vector is a vector of numerical features that represent some object. The common distance measures are summarized in Table 1.1. Common distance functions include Euclidean, Pearson linear correlation, and cosine. In each of these equations, X and Y represent the feature vectors.

A Statistical Machine Learning Framework

View Chapter

Purchase Book

Published in Richard M. Golden, Statistical Machine Learning, 2020

Richard M. Golden

In order to reduce feature vector dimensionality, the goal is to select a subset of all possible input features that will improve the performance of the classification machine or regression model. Choosing a good subset of the set of all possible input features can dramatically improve the performance of a classification machine or regression model. This problem of selecting a best subset of features is called “best subsets regression” in the statistical literature (Beale et al. 1967; Miller 2002). One approach to solving the best subsets regression problem is to simply try all possible subsets of input features. If the number of input features is less than about 20 and if the data set is not too large, then a supercomputer can approach the best subsets regression problem by exhaustively considering all possible 2d subsets of set of d possible input features. This is called “exhaustive search”. For situations, however, involving either larger data sets or more than about 20 input features, then the method of exhaustive search is not a viable alternative because it is too computationally intensive.

Transfer learning-based CNN diagnostic framework for diagnosis of COVID-19 from lung CT images

View Article

Journal Information

Published in The Imaging Science Journal, 2022

R. Keerthana, Angelin Gladston, H. Khanna Nehemiah

Mei-Ling et al. [29] proposed the approach for detecting COVID-19 from X-ray and CT images. They employed seven pre-trained CNN models namely InceptionV3, Xception, ResNet50V2, MobileNetV2, DenseNet121, EfficientNet-B0, EfficientNet-V2 and they developed light weight model namely LightEfficientNet-V2 for identifying COVID-19. They adopted Grid search algorithm to find the optimal batch size, Epochs, Optimizer, Learning rate, and Dropout to identify best optimal values. They evaluated the model on three datasets namely NIHChestX-rays, SARS-CoV-2, and COVID-19 CT images. Sahil et al. [30] developed transfer learning-based model for identifying COVID-19 from lung CT images. They examined the dataset collected from SARS CT dataset which comprised of 2482 scans collected from patients consisted of 1252 COVID-19 positive and 1230 COVID-19 negative images. They adopted Histogram Equalization and CLAHE as the pre-processing technique. They investigated the datasets on models namely ResNet-101, DenseNet-201, MobileNet-V2, VGG-19, and EfficientNet-B4. Feature Vectors were fed as an input to the artificial neural network for classification.

Markov chain latent space probabilistic for feature optimization along with Hausdorff distance similarity matching in content-based image retrieval

View Article

Journal Information

Published in The Imaging Science Journal, 2022

Ramandeep Kaur, V. Devendran

Some Image Retrieval classifications must satisfy the fundamental requirement to analyse and arrange relevant images with the least amount of gadget involvement from the archive. This study examined how to choose visual characteristics for a building based on the needs of the user [12]. Any Image Retrieval (IR) model must demonstrate the distinguishing trait in great detail. Additionally, the characteristics can combine low-level visual features to become very effective and robust, and a large processing cost is required to achieve superior results. Unfortunately, the uneven feature selection limits how effective the Image Retrieval (IR) procedure can be. The feature vector is the input data for the machine learning (ML) model’s training and testing processes. It maximizes performance efficiency [13]. Deep Neural Networks (DNN), which may deliver ideal results on an expensive platform, have recently become a key component of the Image Retrieval process.

An empirical evaluation of recent texture features for the classification of natural images

View Article

Journal Information

Published in International Journal of Computers and Applications, 2020

A. Suruliandi, J.C. Kavitha, D. Nagarajan

Miscellaneous techniques are available for classifying texture images. The Artificial Neural Network (ANN) [15] is a powerful classification technique used for real-valued and discrete-valued approximations. The ANN, used to approximate function that depends on a large number of unknown inputs, applies the back propagation algorithm for learning and is utilized in areas such as speech recognition and interpretation of visual scenes. The decision tree classifier is a hierarchically based classifier that compares data with a group of properly selected features. The decision tree is based on the greedy algorithm in which optimal decisions are made at each node. Clustering is a method of unsupervised classification of patterns (observations, feature vectors) into groups (clusters) with similar characteristics. The similarity measure on which the clusters are defined is based on the Euclidean distance or other distance metrics. It is used in medical imaging for segmentation and object recognition in computer vision tasks.