Explore chapters and articles related to this topic
Clustering Divide and Conquer
Published in Chong Ho Alex Yu, Data Mining and Exploration, 2022
Figures 9.4-9.6 display the biplots of each solution. The two axes represent two principal components (PC). When there are two variables only, the biplot resembles a scatterplot. In a high-dimensional cluster analysis, multiple variables are compressed into two components. At first glance, the first model shown in Figure 9.4 seems to be acceptable, but many data points are unaccounted for (not covered by any sphere). The same problem can be found in the four-cluster solution, as shown in Figure 9.5. Finally, if the five-cluster solution is adopted, only seven observations are unassigned (Figure 9.6), and thus it is considered the best clustering outcome.
Develop
Published in Walter R. Paczkowski, Deep Data Analytics for New Product Development, 2020
The map is formally called a biplot because it simultaneously plots (measures of) the rows and columns of the table on one plot. The “bi” in “biplot” refers to the joint display of rows and columns, not to the dimensionality of the plot, which is two (X-Y). In essence, a biplot is one X-Y plot overlaid on top of another. The biplot allows you to visualize on one map the relationship both within a structure (e.g., rows) and between structures (e.g., rows and columns) of a table. An example is show in Figure 3.12. See Gower and Hand [1996] for a detailed, technical discussion of biplots.
Concepts of Visual and Interactive Clustering
Published in Charu C. Aggarwal, Chandan K. Reddy, Data Clustering, 2018
The navigational graph does not necessarily include all two-dimensional projections as nodes. Data with larger dimensionality d would require (d2) nodes, which would render such navigational graph useless as an effective map for the space of projections. In [33] the use of scagnostic measures [49] is suggested to pick two-dimensional projections with interesting data distributions. Scagnostics are briefly introduced by Tukey and Tukey [44]. The ideas are further developed and described in more detail in [49]. In a nutshell, a number of k measures are computed for each of the distributions in the (d2) axis-parallel projections of a d-dimensional data set to guide the selection of interesting projections. The measures are designed to quantify a wide range of characteristics that appear in two-dimensional distributions. To illustrate the measures, the two-dimensional projections of the mtcars-dataset [22] were quantified by the proposed k = 9 measures [49]. A subset of produced measurement vectors, each with a dimensionality of nine, is shown with the respective distributions in a biplot in Figure 19.2. A biplot [15] reduces the multidimensional vectors to first two principal components and plots the reduced vectors together with the projected unit vectors of the original data space.
Understanding patterns of moped and seated motor scooter (50 cc or less) involved fatal crashes using cluster correspondence analysis
Published in Transportmetrica A: Transport Science, 2023
Subasish Das, Md Mahmud Hossain, M. Ashifur Rahman, Xiaoqiang Kong, Xiaoduan Sun, G.M. Al Mamun
Figure 2 is the biplot of the clustering results. The locations of the centroids of eight clusters are shown in blue ellipses. As the biplot shows the locations of all attributes, it is difficult to explore individual attributes. In this study, biplot visualisation serves as a common diagram to show the nature of the presence of all attributes and their locations in a two-dimensional space. Additional cluster level bar plots are shown later and explained in detail (see Figures 3 and 4). Eight clusters were identified in this plot. The CCA algorithm performs the dimension reduction on the dataset and maximises the distance between the clusters and minimises the distance inside of the clusters. The results ended up with clusters that have a close relationship among the points inside of the same clusters and could be clearly distinguished from the other clusters. Table 5 above shows the amount of variation of the data explained by each cluster. Clusters 1–8 show 16.9%, 16.7%, 13.4%, 13.1%, 12.4%, 10.9%,10.5%, and 6.1% variation of the total dataset, respectively. The first six clusters contain information for more than 80% of the data. The least information is associated with Cluster 8 (only 6.1% of the data).
Contamination of Arable Soil with Toxic Trace Elements (Tes) around Mine Sites and the Assessment of Associated Human Health Risks
Published in Soil and Sediment Contamination: An International Journal, 2023
Gregory Udie Sikakwe, Godswill Abam Eyong, Benneth Uduak Ilaomo
A biplot is a multivariate technique that graphs a data matrix and permits a planned view of the relation and interpretations between rows and columns (Gabriel 1971). A biplot provides a plot of observations but at the same time presents plots of the relative positions of the variables in two dimensions (Joliffe 2002). A biplot shows a sample scores and variable loadings in one plot (Figure 5). A PCA biplot is a two-dimensional chart that represents the relationship between the samples and variables presented in the same plot. PCA biplots in Figure 5 shows PC2 versus PC1 with even distribution of PC scores. Both PC1 and PC2 show almost equal contribution of information of the PCA. The line of regression is parallel to the PC2 axis. A plot of PC4 versus PC3 shows that the line of regression is slightly inclined to the PC3 axis. The spread of data points shows that PC3 contributed a little higher information than PC4 in the PCA. A plot of P 6 versus PC5 shows the line of regression is almost parallel to PC5 and virtually touching the axis, more than what is obtained in other PCA biplots. The biplot shows that PC3 contributed more information to the dataset than PC6.
Improving geological logging of drill holes using geochemical data and data analytics for mineral exploration in the Gawler Ranges, South Australia
Published in Australian Journal of Earth Sciences, 2021
E. J. Hill, A. Fabris, Y. Uvarova, C. Tiddy
The variation matrix for the data set (Aitchison, 1986) and the compositional biplot (Aitchison & Greenacre, 2002) were used to guide dimensionality reduction. The variation matrix describes the dispersion between every pair of elements in the data set, i.e. a high dispersion value indicates elements are unrelated to each other in their behaviour, while a low dispersion indicates elements behave similarly. Dispersion can be thought of as the compositional equivalent to correlation, i.e. low dispersion, like high correlation, indicates elements that behave similarly. The variation matrix is visualised using the relative variation biplot. The biplot is the compositional version of a principal-component analysis plot, it uses the CLR transform to make the biplot suitable for compositional data (Aitchison & Greenacre, 2002). In the biplot, samples are represented by scattered data points and variables are represented by lines that radiate out from the origin. The distance between the tips of a pair of lines approximates the variance of the log ratio of the two elements. Short distances indicate similar variables (in the projected space) and hence closely spaced groups of variables on the biplot indicate redundancy of information. The terms ‘may represent’ and ‘approximate the variance’ are used here because the data are being projected into a lower dimensional space and the geometric constraints may not hold in the higher dimensional space if the total variance accounted for in the 2D biplot is not close to 100%. The Python programming language was used for calculating the variation matrix and biplots.