Explore chapters and articles related to this topic
A Statistical Machine Learning Framework
Published in Richard M. Golden, Statistical Machine Learning, 2020
Statistical machine learning is built on the framework of probabilistic inductive logic. It is assumed that the training data is generated from the environment by sampling from a probability distribution called the environmental distribution. The process which generates the training data is called the data generating process. A statistical learning machine’s knowledge base is a set of probability distributions which is called the learning machine’s probability model. The learning machine additionally may have specific beliefs regarding the likelihood that a probability distribution in its probability model is relevant for approximating the environmental distribution.
Hierarchical Bayesian approaches to statistical modelling of geotechnical data
Published in Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards, 2022
Nezam Bozorgzadeh, Richard J. Bathurst
Figure 2(a) shows a PGM where all of the log-bias data y, structured in J different groups, are assumed to have a (normal) distribution with mean μ and standard deviation σ. Note how the directed edges in the PGM emphasise a statistical model that is essentially the assumed data-generating process, (i.e. a population distribution with parameters μ and standard deviation σ that has given rise to the observed data y). The complete pooling model assumes that the variation among estimates of the means of the individual groups is no larger than what would be expected by chance, and hence it is taken as zero (Gelman et al. 2013). This model yields an estimate of the mean that is assumed to represent all groups, and is therefore sometimes referred to as the identical parameter model. As discussed in the introduction, this is the model commonly used in analysis of geotechnical engineering data, particularly in analysis of model bias data for RBD and LRFD calibration. It will be discussed further in Section 4 how this model is likely to under-fit geotechnical data by ignoring apparent structures in the data. The complete pooling model can be expressed as: where y(i,j) is the i-th observation in group j, with vague prior distributions assigned to the mean μ and standard deviation σ, for example: with “A” set to infinity (to specify an improper vague prior) or a suitably large number (depending on the scale of the data) such as 100. These prior distributions are spread over a wide range of parameter values with no apparent peak (i.e. they are locally flat), indicating no particular prior information about what the value of these parameters should be. For more detailed discussions about specifying prior distributions see e.g. Lunn et al. (2012).