Uncertain data – Knowledge and References

Explore chapters and articles related to this topic

Technology of Intelligent Systems

Published in James A. Momoh, Mohamed E. El-Hawary, Electric Systems, Dynamics, and Stability with Artificial Intelligence Applications, 2018

James A. Momoh, Mohamed E. El-Hawary

The successful performance of expert systems relies heavily on human expert knowledge derived from domain experts based on their experience. The other forms of knowledge include causal knowledge and information from case studies, databases, etc. Knowledge is typically expressed in the form of high level rules. The expert knowledge takes the form of heuristics, procedural rules, and strategies. It inherently contains vagueness and imprecision because experts are not able to explicitly express their knowledge. The process of acquiring knowledge is also quite imprecise, because the expert is usually not aware of all the tools used in the reasoning process. The knowledge that one reasons with may itself contain uncertainty. Uncertain data and incomplete information are other sources of uncertainty in expert systems.

Complex Mining from Uncertain Big Data in Distributed Environments

View Chapter

Purchase Book

Published in Kuan-Ching Li, Hai Jiang, Albert Y. Zomaya, Big Data Management and Processing, 2017

Alfredo Cuzzocrea, Carson Kai-Sang Leung, Fan Jiang, Richard Kyle MacKinnon

The research problem of frequent itemset mining was first introduced [56] in 1993. The corresponding algorithm—namely, Apriori—mined all frequent itemsets from a transaction database (TDB) consisting of precise data, in which the contents of each transaction are precisely known. Specifically, if a transaction ti contains an item x (i.e., x ∈ ti), then x is precisely known to be present in ti. On the other hand, if a transaction ti does not contain an item y (i.e., y ∉ ti), then y is precisely known to be absent from ti. However, this is not the case for probabilistic databases consisting of uncertain data. A key difference between precise and uncertain data is that each transaction of the latter contains items and their existential probabilities. The existential probability P(x,ti) of an item x in a transaction ti indicates the likelihood of x being present in ti. For a real-life example, each transaction ti represents a patient's visit to a physician's office. Each item x within ti represents a potential disease, and is associated with P(x,ti) expressing the likelihood of a patient having that disease x in ti (say, in t1, the patient has a 60% likelihood of having asthma, and a 90% likelihood of catching a cold regardless of having asthma or not). With this notion, each item in a transaction ti in datasets of precise data can be viewed as an item with a 100% likelihood of being present in ti.

Holistic simulation of tire-pavement-system: Mechanics and uncertainty

View Chapter

Purchase Book

Published in Eyad Masad, Amit Bhasin, Tom Scarpas, Ilaria Menapace, Anupam Kumar, Advances in Materials and Pavement Performance Prediction, 2018

M. Kaliske, I. Wollny, F. Hartung, M. Götz

Subsequently, the FE tire and the FE pavement models as well as their coupling by a program interface are shown. The potential for pavement investigations is outlined by a numerical example. Further, the consideration and quantification of uncertain data is discussed.

A generalised uncertain decision tree for defect classification of multiple wafer maps

View Article

Journal Information

Published in International Journal of Production Research, 2020

Byunghoon Kim, Young-Seon Jeong, Seung Hoon Tong, Myong K. Jeong

There has been great interest in uncertain data mining; therefore, many studies have been conducted in the development of uncertain data mining algorithms. Uncertain data is found in many research fields: classification of breast cancer, location tracking of mobile devices, and measurement of body temperature (Tsang et al. 2011). In many studies, uncertain data is represented by probabilistic database to capture its underlying complexities (Dai et al. 2005; Pei et al. 2007; Sun et al. 2010; Tsang et al. 2011). The probabilistic database is defined to be a finite probability space whose outcomes are all possible database instances. This can be represented as the pair (X, p) where X is a finite set of possible database instances, and p(i) is the probability associated with any instance i in X (Green and Tannen 2006; Aggarwal and Philip 2009; Sun et al. 2010). In some studies, the probability distributions of uncertain features were estimated to describe the uncertainty of the features by using statistical methods (Aggarwal 2009). Therefore, in the classification problem of uncertain data, each observation has a class label and uncertain features of which the values can be represented by a unique probability distribution function. The uncertain data objects can be reduced to certain data objects if only a representative statistic such as the mean of each object is considered (Tavakkol, Jeong, and Albin 2017). However, uncertain data objects carry more rich information than certain data objects that are converted from the uncertain data objects. Therefore, it is important to employ data mining techniques that can handle uncertain data objects and capture the rich information (Tavakkol, Jeong, and Albin 2019).

Classifying random variables based on support vector machine and a neural network scheme

View Article

Journal Information

Published in Journal of Experimental & Theoretical Artificial Intelligence, 2022

Amir Feizi, Alireza Nazemi

For this purpose, the output of a classifier was analysed and the distribution of the output was described by the beta distribution parameters. As a result, the classifier with the proposed scaling method was referred to as the class probability output network (CPON) can provide accurate posterior probabilities for the soft decision of classification. Kim et al. (2014) proposed SVMs with CPONs that the classifiers were optimised using the structural risk minimisation (SRM) principle. The conditional class probability for the given pattern was estimated using the parameters of the beta distribution. Feature selection for two-class linear SVM models with the presence of uncertain data was proposed using principles of robust optimisation and embedded methods (do not separate the learning from the feature selection part, integrate the selection of features in the model building; Le Thi et al., 2014). Extreme Learning Machine (ELM) is a successful single hidden layer feed-forward neural network for classification (Huang et al., 2006). Sun et al. (2014) proposed a new classification algorithm based on ELM to process classification for uncertain data. They modelled uncertain data as an object consisting of instances with arbitrary probability distribution. Firstly, they trained each instance associated with the uncertain data object. Then, the class probabilities of each instance was computed according to the learning results. At the end, final classification results were obtained using a probability bound-based approach. Yazdi et al. (2007) proposed a new SVM classifier with probabilistic constraints which constraints boundaries have probability density functions and constraints occur with probability between 0 and 1. Lanckriet et al. (2002) considered the case of binary classification, where only the mean and covariance matrix of the classes are assumed to be known. The minimax probabilistic decision hyperplane is then determined by optimising the worst-case probabilities over all possible class-conditional distributions. The computational complexity of their method is comparable to the QP that one has to solve for the SVM.