Mixture models – Knowledge and References

Explore chapters and articles related to this topic

Modern Methods for Characterization of Social Networks through Network Models

Published in Natalie M. Scala, James P. Howard, Handbook of Military and Defense Operations Research, 2020

Christine M. Schubert Kabban, Fairul Mohd-Zaid, Richard F. Deckro

Each of the random graph generating models described focus on a network property (i.e., scalefree or smallworld) and attempts to model that specific property through a single unified distribution for a network measure such as degree. More recent work has focused on using multiple distributions for a single network measure or a combination of network measures. Such models are called mixture models and are motivated by the lack of fit of any one individual property across similar real-world data sets. A mixture model is a single probabilistic model that combines several distributions, each from a different subpopulation, in which the distribution for each subpopulation is governed by different parameter settings. These distributions then are combined into one model through the use of a weighting term that weights each of the distributions; the sums of the weights must equal one. Parameter estimation for the mixture model is then accomplished through statistical techniques such as Expectation-Maximization or Bayesian methods.

Machine Learning – Supervised Learning

View Chapter

Purchase Book

Published in Rakesh M. Verma, David J. Marchette, Cybersecurity Analytics, 2019

Rakesh M. Verma, David J. Marchette

There are various methods that utilize clustering algorithms. One posits that points in the same cluster should have the same label, and develops methods that try to cluster the data into clusters that are class-pure. One way to think about this is in the mixture framework. A mixture model is a clustering of the observations (where this is a soft clustering rather than a hard one in which an observation can only be in one cluster). When performing the EM algorithm, one constrains certain components to be only “available” for a given class label (or for unlabeled points). Essentially, one is given a partial association of the observation to the components: x, being from class y, is definitely drawn from these specific components; the EM algorithm is then modified to take this into account.

Aggregating network-level pavement performance data based on Gaussian Mixture Models

View Chapter

Purchase Book

Published in Sandra Erkens, Xueyan Liu, Kumar Anupam, Yiqiu Tan, Functional Pavement Design, 2016

Ye Yu, Lu Sun, Haoran Zhu

A mixture model is a parametric probability density function represented as a weighted sum of component densities (Frühwirth-Schnatter, 2006). The shape of the mixture density can be extremely flexible. Mixture models have been successfully used for capturing many specific properties of real data such as multimodality, skewness, and kurtosis (Frühwirth-Schnatter, 2006). When every component in a mixture model follows Gaussian (normal) distribution, the mixture model is called Gaussian Mixture Model (GMM). GMM is intensively used as a parametric model for the probability distribution of continuous measurements due to the simplicity of the estimation process (Jun, 2010). In transportation engineering, GMM has been recently proposed to model heterogeneity in vehicle speed data (Park et al., 2010) and gross vehicle weight distribution (Hyun et al., 2015). In this study, a Gaussian (normal) mixture model is adopted to aggregate pavement performance data of multiple pavement segments while accounting for performance heterogeneity.

A Bayesian Nonparametric Mixture Measurement Error Model With Application to Spatial Density Estimation Using Mobile Positioning Data With Multi-Accuracy and Multi-Coverage

View Article

Journal Information

Published in Technometrics, 2020

Youngmin Lee, Taewon Jeong, Heeyoung Kim

A DP mixture model is a mixture model that uses the DP as a prior over the mixing distribution of the model parameters. A main advantage of the DP mixture model is that it does not require choosing the number of mixture components in advance; it is automatically inferred from the data. The DP mixture model can be seen as an infinite mixture model that has a countably infinite number of mixture components. Using the DP mixture model, the density for a set of observations is modeled using the set of latent parameters , where each is drawn independently and identically from G0, while each xi has distribution parameterized by where Θ is the support of the locations of the random measure G.

How accurate is your travel time reliability?—Measuring accuracy using bootstrapping and lognormal mixture models

View Article

Journal Information

Published in Journal of Intelligent Transportation Systems, 2018

Shu Yang, Payton Cooke

Single probability models (e.g., Gaussian, Lognormal, and Weibull distributions) were widely used in previous studies (for example, Chen et al., 2003; Eman & Al-deek, 2006; Pu, 2012). However, a recent paper by Yang and Wu (2016) found that travel time distributions may be underfitted when using single probability models. Mixture probability models may estimate travel time distributions more accurately. Widely used mixture probability models include the Gaussian, Lognormal, and Gamma mixture models. Yang and Wu (2016) also found that the choice of mixture models may impact TTR measures less if travel time distributions could be well fitted. Lognormal mixture models were used in our study. Equations 5 and 6 show the mathematical definition of the lognormal mixture models. Where: T are travel times in a bootstrap replication ; K is the number of lognormal distributions; wk is the weight of the kth lognormal distribution; μk and σk are the mean deviation and SD of the kth lognormal distribution.

A Hybrid Machine Learning Model for Credit Approval

View Article

Journal Information

Published in Applied Artificial Intelligence, 2021

Cheng-Hsiung Weng, Cheng-Kui Huang

The most well-known and commonly used partitioning methods are k-means, expectation maximization (EM), and their variations. The k-means algorithm takes the input parameter, k, and partitions a set of n objects into k clusters so that the resulting intra-cluster similarity is high but the inter-cluster similarity is low. The k-means algorithm is a popular clustering algorithm that requires a huge initial set to start the clustering. This is an unsupervised clustering method that does not guarantee convergence. EM is an improvement of the k-means algorithm that offers better performance. It is a statistical technique for maximum likelihood estimation using mixture models.