MDL – Knowledge and References

Explore chapters and articles related to this topic

Sampling and Laboratory Analysis for Solvent Stabilizers

Published in Thomas K.G. Mohr, William H. DiGuiseppi, Janet K. Anderson, James W. Hatton, Jeremy Bishop, Barrie Selcoe, William B. Kappleman, Environmental Investigation and Remediation, 2020

Thomas K.G. Mohr, Jeremy Bishop

Vacuum distillation is another means of transferring contaminants from samples to the carrier gas. USEPA Method 5032 (USEPA, 1996d) uses vacuum distillation and a cryogenic trapping procedure followed by GC–MS. The sample is introduced into a sample flask, which is then depressurized to the vapor pressure of water through the use of a vacuum pump. The vapor is passed over a condenser coil chilled to −10°C or less to condense water. The uncondensed distillate is cryogenically trapped on stainless steel tubing chilled with liquid nitrogen (−196°C). The condensate is then thermally desorbed and transferred to the GC by using helium carrier gas (USEPA, 2006b). The 1996 publication of USEPA Method 5032 does not identify 1,4-dioxane as suitable for analysis using vacuum distillation; however, a 2004 improvement to the method includes 1,4-dioxane as a suitable analyte (Strout et al., 2004a). USEPA Method 8261A uses a vacuum distillation unit to extract volatile and semi-VOCs, including 1,4-dioxane. The lowest MDL obtained for 1,4-dioxane was 2.5 μg/L for low-concentration water samples, but recoveries were variable (Strout et al., 2004a). See Section 4.5.4 for more discussion of vacuum distillation in USEPA Method 8261A.

An Extensible Frame Language for the Representation of Process Modeling Knowledge

View Chapter

Purchase Book

Published in Don Potter, Manton Matthews, Moonis Ali, Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 2020

Christian Rathke, Frank Tränkle

Viewing the progression of layers from CLOS to Mdl as a way of reducing the semantic distance from a general programming language to an application specific language, an important step is taken when moving from a frame representation (FrameTalk) to a domain specific representation (Mdl). The computational paradigm of manipulating slots and thereby invoking attached procedures is replaced by a declarative approach of specifying relations from the chemical engineering domain between devices, phases and terminals.

Preprocessing in Big Data: New challenges for discretization and feature selection

View Chapter

Purchase Book

Published in Matthias Dehmer, Frank Emmert-Streib, Frontiers in Data Science, 2017

Verónica Bolón-Canedo, Noelia Sánchez-Maroño, Amparo Alonso-Betanzos

As a brief reminder, discretization is a process that groups continuous values in a number of discrete intervals, thus reducing data sizes, and allowing to prepare it for further analysis. Besides, some algorithms for classification or FS, for example, only accept categorical attributes as input. Furthermore, discretization allows in general for a more accurate and quick learning process [124]. In the process of discretization, several decisions, such as how many continuous values should be grouped in an interval, how many intervals are adequate for a given problem, or where should the cut-points be established in the scale of values, are far from trivial. Several discretization algorithms have been developed in accordance [2,124–126], such as those mentioned in section “Popular discretization methods.” Among them, it is worth mentioning the entropy-based developed by Fayyad and Irani [26], which is a global, static, and supervised method that has become very popular, and it is in fact the discretization method used by default in the popular Weka learning platform [27]. This method will be described briefly here, as it has been one of the first discretization algorithms adapted to distributed environments [2], something that could appear to be simple but in fact it is not, due to the recursive nature of the method and the high dependence among threshold candidates. The main problem of discretization consists in finding the thresholds that determine in which intervals should be grouped the different possible values of the variables. The process followed for each variable to be discretized is the following: The values of each attribute, together with their corresponding class, are ordered.The candidate points are to be determined and sorted. A candidate point is the middle point between two different values of the attribute, the classes of which are also different. The sorting operation is a time-consuming operation.The candidate values are evaluated, and this operation is the most time consuming, as in the worst case, it implies a complete evaluation of entropy for all points. The evaluation of each point implies calculating the entropy of the classes in both sides of the point; thus, each candidate point depends on the other candidates. Once all candidate points are evaluated, the best is chosen and the values are partitioned accordingly.The algorithm is executed recursively until the stop criteria are met. In our implementation, to the original stop criteria of MDL principle, it has been added a limitation in the maximum number of intervals per variable.

A targeted Bayesian network learning for classification

View Article

Journal Information

Published in Quality Technology & Quantitative Management, 2019

A. Gruber, I. Ben-Gal

Over the years, several approaches which are not limited to BN learning were suggested for tackling overfitting and the accuracy–complexity trade-off. One well-known approach is the Minimum Description Length (MDL) (Rissanen, 1978). The MDL uses the Kullback–Leibler (KL) divergence that is also used in this work. The KL divergence provides a measure of fit between two distributions, often one is considered as the true unknown distribution, while the second is its estimated distribution. The KL divergence, denoted here by , can be thought of as a penalty term (measured in bits of code length), for relying on the distribution q, while the true distribution is in fact p. As such, a model that minimizes the KL divergence also minimizes the description length. For example, the C4.5 algorithm that applies the MDL principle on decision trees attempts to maximize the information about the class variable (Quinlan & Rivest, 1989). Also, in unsupervised tasks it is used for clustering and dimension reduction (Xia, Zong, Hu, & Cambria, 2013; Zhao et al., 2015).