Explore chapters and articles related to this topic
Applications in Industry
Published in Sylvia Frühwirth-Schnatter, Gilles Celeux, Christian P. Robert, Handbook of Mixture Analysis, 2019
Kerrie Mengersen, Earl Duncan, Julyan Arbel, Clair Alston-Knox, Nicole White
The initial motivation for the prior distribution on the presence probabilities η stems from dependent Dirichlet processes introduced by MacEachern (1999). The Dirichlet process (Ferguson, 1973) is a popular BNP distribution for species modelling which conveys an interesting natural clustering mechanism. Dependent Dirichlet processes were proposed by MacEachern in order to extend Dirichlet processes to multiple-site situations, and to allow for borrowing of strength across the sites; see also Section 6.2.3 above. Dirichlet processes (and their dependent version) are (almost surely) discrete random probability measures, hence they consist of random weights and random locations. Of interest for us is the distribution of the random weights which shall constitute the prior distribution on the presence probabilities. We first describe the distribution of the weights of the Dirichlet process, also known as the Griffiths–Engen–McCloskey (GEM) distribution, used as a prior for η(x) for any given covariate value x, and then turn to describe the distribution of the weights of the dependent Dirichlet process, which we call dependent GEM, used as a joint prior for η = (η(x1),…, η(xI)).
Real-Time Identification of Performance Problems in Large Distributed Systems
Published in Ashok N. Srivastava, Jiawei Han, Machine Learning and Knowledge Discovery for Engineering Systems Health Management, 2016
Moises Goldszmidt, Dawn Woodard, Peter Bodik
Our approach to clustering is based on a Dirichlet process mixture (DPM) model. The DPM provides natural prior specification for online clustering, allowing estimation of the number of clusters while maintaining exchangeability between observations [25]. A DPM can be obtained as the limit of a finite mixture model with Dirichlet prior distribution on the mixing proportions [26,27]. In our context, the DPM is parameterized by a scalar a controlling the expected number of types occurring in a set of crises and by a prior distribution G0({γ(i),T(j)}j) for the set of all parameters associated with each crisis type k.
Clustering
Published in A. C. Faul, A Concise Introduction to Machine Learning, 2019
A Dirichlet processDirichlet process extends the concept to an unknown variable number of clusters K. The notation is vn∼N(μk,Σk)μk,Σk∼G,G∼DP(α,G0). The last line means that G is drawn from a Dirichlet process with parameters α and G0. A Dirichlet process is a distribution of distributions. There are infinite many possibilities for pairs (μ, Σ) and the vector π is often described as infinite giving each pair a probability mass. Conceptually, infinity is difficult. In practice, the number of clusters K is at most the number of samples N, when every cluster contains at least one sample. Therefore G gives a probability mass of Nk/(N − 1 + α) to at most N pairs (μ, Σ) drawn from the base distribution G0, and assigns the probability of α/((N − 1 + α) to the set of all other possible pairs. G is a discrete distribution defined on a finite partition of the space of all pairs (μ, Σ) which is denoted by S.
Distribution inference from early-stage stationary data streams by transfer learning
Published in IISE Transactions, 2022
Kai Wang, Jian Li, Fugee Tsung
Another way to treat small data is to incorporate prior information via the Bayesian framework, and the estimation can be updated upon every new data point by posterior sampling (Bishop, 2006). To infer distributions of any potential forms, the Bayesian nonparametric density estimation has been widely studied (Müller and Quintana, 2005; Jara et al., 2011; Polansky, 2014; Li et al.2017), where the underlying distribution is often modeled by an infinite mixture of normal distributions whose parameters follow a Dirichlet Process (DP) prior. The DP prior and its hyperparameters impact the posteriors greatly when the data are limited, and thus need a demanding tuning. The elicitation of an informative prior for a target dataset from an auxiliary dataset actually neglects their inherent difference, and could impair the target distribution inference.
Nonparametric Bayesian Modeling and Estimation for Renewal Processes
Published in Technometrics, 2021
Sai Xiao, Athanasios Kottas, Bruno Sansó, Hyotae Kim
Here, we define the mixture weights through a distribution function that is modeled nonparametrically with a Dirichlet process (DP) prior (Ferguson 1973; Antoniak 1974). This provides a novel Erlang mixture formulation for Bayesian nonparametric density estimation on . It extends Bernstein polynomial priors for density estimation on the unit interval, which have been explored in the Bayes nonparametrics literature following the work by Petrone (1999a, 1999b). A key feature of our method is that the choice of the DP centering distribution controls clustering or declustering patterns for the point process, which can therefore be informed by the prior specification. From a computational point of view, the proposed model for the inter-arrival distribution has the advantage of enabling efficient posterior simulation, while properly accounting for the renewal process likelihood normalizing constant.
Contextual anomaly detection for high-dimensional data using Dirichlet process variational autoencoder
Published in IISE Transactions, 2023
A Dirichlet Process (DP) is an infinite dimensional extension of a Dirichlet distribution, and is defined by a concentration parameter α and a base measure G0. Consider a random probability measure where is an indicator function centered on which is drawn from G0, and is an infinite sequence of mixture weights that are generated from the stick-breaking process as follows: