Dimensional reduction – Knowledge and References

Explore chapters and articles related to this topic

Machine Learning for Solving a Plethora of Internet of Things Problems

Published in Kamal Kumar Sharma, Akhil Gupta, Bandana Sharma, Suman Lata Tripathi, Intelligent Communication and Automation Systems, 2021

Sparsh Sharma, Abrar Ahmed, Mohd Naseem, Surbhi Sharma

Dimensionality relates to the number of characteristics, or input variables, presented in a dataset. If the dataset is noisy or if the number of input variables is large [14], the algorithms do not form effective models [15]. Dimensional reduction is a process of converting a large dataset into smaller data to transmit similar information concisely [16]. Data is compressed and reduced, so storage space and computation time is also reduced [17], and dimensionality reduction gives better computation. Dimensionality reduction is an unsupervised technique that removes redundant features from data.

Multivariate Statistics Neural Network Models

View Chapter

Purchase Book

Published in Basilio de Bragança Pereira, Calyampudi Radhakrishna Rao, Fábio Borges de Oliveira, Statistical Learning Using Neural Networks, 2020

Basilio de Bragança Pereira, Calyampudi Radhakrishna Rao, Fábio Borges de Oliveira

In this chapter, neural networks are used to perform multivariate statistical analysis, for instance, cluster and scaling network analysis, competitive, learning vector quantization, adaptive resonance theory (ART) networks and self-organizing map (SOM) networks. In addition, this chapter presents dimensional reduction methods: linear and nonlinear principal component analysis (PCA), independent component analysis (ICA), factor analysis (FA), correspondence analysis (CA), multidimensional scaling. Moreover, the networks for these methods are presented: PCA networks, nonlinear PCA networks, FA networks, CA networks, and ICA networks.

Impact of Dimensionality Reduction on Performance of IoT Intrusion Detection System

View Chapter

Purchase Book

Published in Bharat Bhushan, Sudhir Kumar Sharma, Bhuvan Unhelkar, Muhammad Fazal Ijaz, Lamia Karim, Internet of Things, 2022

Susanto, M. Agus Syamsul A., Deris Stiawan, Mohd. Yazid Idris, Rahmat Budiarto

DR is data projection process from high-dimensional data to lower dimension data by converting data from n-dimension to k-dimension, where k < n. DR has been proven as important part of data analysis for machine learning (ML), especially in preprocessing data. Dimensional reduction implementation in preprocessing data can increase efficiency and effectivity of the learning process. It eliminates irrelevant or exceeding data, increases learning accuracy, and produces comprehension result. In literatures, DR techniques are categorized into two groups, i.e. linear dimension reduction and nonlinear dimension reduction.Linear dimension reduction. It uses basic linear function to convert high-dimension data to lower dimension data. DR techniques, which are grouped in this category, include Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Latent Semantic Analysis (LSA), Locality Preserving Projections (LPP), Independent Component Analysis (ICA), Linear Discriminant Analysis (LDA), Projection pursuit, Incremental PCA (I-PCA), Probabilistic PCA (P-PCA), Sparse PCA (S-PCA), Factor Analysis (FA), Fast ICA, Truncated SVD, Large-Margin Nearest Neighbor (LMNN) Metric Learning, Linear Local Tangent Space Alignment, Nonnegative Matrix Factorization (NMF), and Random Projection (RP).Nonlinear dimension reduction. It uses basic nonlinear function to change high-dimensional data to lower dimension data. DR techniques that are included in this category are as follows: Kernel Principal Component Analysis (KPCA), Multidimensional Scaling (MDS), Isomap, Locally Linear Embedding (LLE), Learning Vector Quantization (LVQ), Self-Organizing map, T-Stochastic Neighbor Embedding (T-SNE), Landmark Isomap, Variants Hessian LLE, Modified LLE, Local Tangent Space Alignment (LTSA), Laplacian Eigenmaps, Diffusion Maps, Manifold Charting, Local Linear Coordination (LLC), Local Affine Multidimensional projections (LAMP), Projection by Clustering (PBC), Interactive Document Maps (IDMAP), Maximally Collapsing Metric Learning (MCML), Uniform Manifold Approximation and Projection (UMAP), Nonmetric Multidimensional Scaling (N-MDS), Landmark MDS, Gaussian Process Latent Variable Model (GPLVM), Sparse Random Projection (S-RP), Maximum Variance Unfolding (MVU), Fast MVU, Landmark MVU, Generalized Discriminant Analysis (GDA), Kernel LDA, Least Square Projection (LSP), Fastmap, Piecewise Least Square Projection (PLSP), Autoencoder, Stochastic Proximity Embedding (SPE).

COVID-19 lung infection detection using deep learning with transfer learning and ResNet101 features extraction and selection

View Article

Journal Information

Published in Waves in Random and Complex Media, 2022

Raja Nadir Mahmood Khan, Lal Hussain, Ala Saleh Alluhaidan, Abdul Majid, Kashif J. Lone, Rufat Verdiyev, Fahd N. Al-Wesabi, Tim Q. Duong

The ML requires the hand-crafted features to further train the algorithms for classification purposes. However, all the computed features are not relevant and important. In order to acquire the most relevant features according to the features importance and ranks, the feature selection algorithms are used. The feature selection algorithms from set of extracted features select the features which contain the most relevant information to unfold the dynamics of a system [17]. Feature selection methods are helpful to discard the redundant information in the dataset [18]. Another advantage of the feature selection methods is that they require less memory because of the dimension reduction [19]. The features from high dimensional dataset are selected by employing chi-square feature selection method [20]. A large number of features are contained in high dimensional data, and the classification learner finds it hard to learn from high dimensional features. In this study, we computed 2048 features FC layer of ResNet101 from multiclass. Moreover, from high dimensional data, the dimensional reduction is performed which discard less informative variables and contains only the most informative variables. Following feature selection algorithms were employed in this study.

A novel approach to the analysis of spatial and functional data over complex domains

View Article

Journal Information

Published in Quality Engineering, 2020

Laura M. Sangalli

Figure 5 shows the first three principal components estimated by the regularized fPCA technique proposed in Lila, Aston, and Sangalli (2016a). These functions, computed over the cortical surface, identify the first three main connectivity patterns across subjects. Moreover, they can be used to perform dimensional reduction of this highly dimensional dataset. The principal components combine a desired smoothness with the ability to capture strongly localized features in the modes of variation. Lila, Aston, and Sangalli (2016a) shows that the proposed method outperforms standard multivariate PCA, that return estimates characterized by excessive local variation, neglecting the shape of the domain; the proposed method is also proved superior to the classical pre-smoothing approach, where each subject-specific map is smoothed previous to performing the multivariate PCA.

Enhancement of membrane system performance using artificial intelligence technologies for sustainable water and wastewater treatment: A critical review

View Article

Journal Information

Published in Critical Reviews in Environmental Science and Technology, 2021

Nguyen Duc Viet, Duksoo Jang, Yeomin Yoon, Am Jang

Generally, the AI-based models that are used to predict the performance of membrane processes can be classified into three groups: machine learning (ML), ANN, and search algorithms (see Figure 2). Machine learning utilizes algorithms that analyze data, then learn from it and use the knowledge from learning process to summarize trends or patterns of interest, while an ANN is more complexed, it comprises a set of algorithms that used in ML for data modeling using a network of hidden layers and neurons. ML is commonly utilized to solve problems involving regression, classification, clustering, or dimensionality reduction (Li et al., 2021). Principal component analysis (PCA) is a dimensional reduction method used in ML that is normally utilized to solve high-dimensional datasets (Antonopoulos et al., 2020). The K-means algorithm is a distance-based method that is commonly used in clustering items to predict which points are at the center of a cluster and label each data point in the set (Zhou et al., 2016). Another ML method is the support vector machine (SVM), which strongly supports mathematical theory. SVM is able to deal with optimization problems, ensuring the overall capability of an algorithm and avoiding the problem (Chen et al., 2013). Random forest (RF) is commonly used to perform nonlinear classification and regression analysis; it is also able to evaluate the importance of a variable while conducting a classification or regression analysis. Fuzzy logic (FL) is a multivalued logic ML method that is used for studying fuzzy judgment. The combination of FL and ANN, referred to as ANFIS, is commonly utilized in predicting system performance. Hierarchical clustering is a cluster analysis technique that classifies to construct a group of clusters and is normally used in segmentation problems (Ikeda & Nishi, 2016; Kwac et al., 2014). So far, several methods in ML such as RF, FL, SVM, or ANFIS have been effectively utilized in simulation of membrane processes (i.e., NF, RO, FO). The detail information will be deeply discussed in Sec. 4.