Explore chapters and articles related to this topic
Security and Privacy in Big Data Cyber-Physical Systems
Published in Yassine Maleh, Mohammad Shojafar, Ashraf Darwish, Abdelkrim Haqiq, Cybersecurity and Privacy in Cyber-Physical Systems, 2019
L. Josephine Usha, J. Jesu Vedha Nayahi
Data utility metrics is also called information loss metrics and are used to quantify the loss of utility. Privacy-preserving techniques achieve privacy by affecting the quality of data being published. Data utility metrics are used to compare the difference between original data and the transformed data. The various parameters used for measuring the data quality are accuracy, completeness, and consistency. Accuracy deals with the estimation of the closeness between the transformed data and the original data, completeness refers to the loss of individual data in the transformed data, and consistency refers to the loss of correlation in the transformed data. The various commonly used metrics for measuring the loss of utility are discernibility metric, average equivalence class size, and the Kullback–Leibler divergence metric.
Mastering Data Quality
Published in Bert Brijs, Business Analysis for Business Intelligence, 2016
The owner may be the buyer of external data, the creator of company-owned data, the sponsor, or the user; whoever the organization has decided can play this role, make sure you get in touch with this person or persons. The owner is responsible for the definitions and decides who has access or who may grant access to data. The larger the organization, the more you will have to make sure they are all on the same page with regard to policies and responsibilities and there are no territorial battles between them. The data stewards manage data sets with a special focus on integrity, privacy, and data quality.
Evaluating Data Quality
Published in Natassia Goode, Paul M. Salmon, Michael G. Lenné, Caroline F. Finch, Translating Systems Thinking into Practice, 2019
Natassia Goode, Paul M. Salmon, Michael G. Lenné, Caroline F. Finch
Data quality refers to the completeness and validity of recorded data (German et al., 2001). There are five characteristics that are relevant to assessing data quality in an incident reporting system: data completeness, positive predictive value, sensitivity, specificity and representativeness (see Table 10.1). These characteristics provide important information about whether the data, and resulting analyses, are accurate and valid reflections of the frequency and causes of incidents within the specific context.
Artificial Intelligence Governance For Businesses
Published in Information Systems Management, 2023
Johannes Schneider, Rene Abraham, Christian Meske, Jan Vom Brocke
Data quality denotes the ability of data to meet its usage requirements in a given context (Khatri & Brown, 2010). The relevance and handling of data quality might differ for AI systems compared to business intelligence applications or other forms of analytics. In the context of AI, better data quality is always preferable from a model performance perspective. However, improving data quality also comes with high costs. Given that state-of-the-art ML models have shown some robustness to certain forms of data quality issues (Frénay & Verleysen, 2013; Nigam et al., 2020; Song et al., 2020), the preferred data quality from an economic perspective might be a revenue tradeoff between increased model performance and costs for data cleansing. There also exist techniques to filter “bad” data that might even deteriorate model performance when used for training – see, Brodley and Friedl (1999) for pioneering work and Ghorbani and Zou (2019) for a recent technique. Therefore, data filtering techniques might replace data cleansing given that a large amount of mostly accurate data exists.
Big Data Classification Using Enhanced Dynamic KPCA and Convolutional Multi-Layer Bi-LSTM Network
Published in IETE Journal of Research, 2023
Following the collection of raw data, the data are essentially delivered in the appropriate structure, amount, and format for all data analytic tasks. Data pre-processing is used to meet this need, and it consists primarily of four tasks: data cleansing, data transformation, data reduction, and data partitioning. By removing duplicate and/or irrelevant observations and/or outliers, and filling in missing values, data cleaning tries to improve data quality. When specialized modeling methods demand a specific data attribute, for example, categorical and numerical or data scale, data transformation is performed. The goal of data reduction is to find the utmost important factors or variables in modeling, minimize dataset size, and increase calculation efficiency. The goal of data partitioning is to break down a huge dataset into numerous smaller sets or groups that may be evaluated independently to increase the model's sensitivity and robustness. Preprocessing of the data is also done using weights allocated to features based on size, content, importance relevance, and keywords. Here, the automated weight assignment technique is used.
Data-driven machine criticality assessment – maintenance decision support for increased productivity
Published in Production Planning & Control, 2022
Maheshwaran Gopalakrishnan, Mukund Subramaniyan, Anders Skoogh
The central aspect of this type of decision support tool is the need for real-time production system data. A study has shown that on an average, 100 data rows are collected per hour per machine by the MES, implying that 500,000 data rows are collected per year per machine (Subramaniyan et al. 2016). Therefore, manufacturing companies can collect a large amount of data and use advanced data analytics to make fact-based decisions (O’Donovan et al. 2015). However, data quality is important to ensure that data-driven decisions are reliable and effective. Extensive research is being conducted to ensure data quality, as good data can dramatically increase the size and scope of improvements in companies (Batini et al. 2009). Additionally, competence in maintenance personnel is needed to execute the data analytics. Education and training for maintenance personnel are identified as activities critical for managing future competence requirements and maintain competitiveness (Bokrantz et al. 2017).