Cross-Industry Standard Process for Data Mining – Knowledge and References

Explore chapters and articles related to this topic

Machine Learning

Published in Pedro Larrañaga, David Atienza, Javier Diaz-Rozo, Alberto Ogbechie, Carlos Puerto-Santana, Concha Bielza, Industrial Applications of Machine Learning, 2019

Pedro Larrañaga, David Atienza, Javier Diaz-Rozo, Alberto Ogbechie, Carlos Puerto-Santana, Concha Bielza

Cross-industry standard process for data mining (CRISP-DM) (Shearer, 2000) is a process that describes commonly used approaches in industry for transforming data into machine learning models. The process is iterative –until achieving a good enough solution–, and interactive –the process flow can move back and forth between different steps depending on the current solution quality. CRISP-DM breaks the process of knowledge discovery into six major steps (Fig. 2):

Generation of synthetic manufacturing datasets for machine learning using discrete-event simulation

View Article

Journal Information

Published in Production & Manufacturing Research, 2022

K. C. Chan, Marsel Rabaev, Handy Pratama

Spanning across multiple areas of DES, big data, and AI/ML in production research, synthetic data generation plays an increasingly important role that warrants more focused research attention. In data mining, the CRISP-DM (Cross Industry Standard Process for Data Mining) project specifies a comprehensive process model for conducting data mining projects. The process model is independent of both the industry sector and the technology used (Azevedo & Santos, 2008; Schröer et al., 2021; Wirth & Hipp, 2000). Although such a process model is available, most studies reported in the literature do not follow any specific process model. Similarly, our literature review shows that there is no standard model or framework exists for the synthetic data generation process. Therefore, this study aims to propose a structured framework for synthetic data generation using DES for manufacturing problems.

Realising the promises of artificial intelligence in manufacturing by enhancing CRISP-DM

View Article

Journal Information

Published in Production Planning & Control, 2023

Jon Bokrantz, Mukund Subramaniyan, Anders Skoogh

In this section, we summarise the CRISP-DM methodology in its original format as well as describe related key extensions of CRISP-DM for manufacturing contexts. For an overview of similar and competing methodologies such as KDD and SEMMA, see Azevedo and Santos (2008) or Dåderman and Rosander (2018). CRISP-DM (Cross Industry Standard Process for Data Mining) was developed as a generic and systematic approach for conducting data mining projects. The original version is extensively described in existing literature (Wirth and Hipp 2000). The goal of introducing CRISP-DM was to serve as a common reference point to discuss data mining, increase the understanding of crucial data mining issues, and fortify data mining as an established engineering practice (Wirth and Hipp 2000). Today, CRISP-DM is widely known as the ‘de-facto standard’ for applying a process model in data mining projects (Schröer, Kruse, and Gómez 2021). The model represents a systematic process for data mining projects in a hierarchy of four levels: (1) phases, (2) generic tasks, (3) specialised tasks, and (4) process instances. The first level consists of six phases: business understanding (understanding and converting project objectives and requirements into a problem definition), data understanding (collection and familiarisation with the data), data preparation (constructing the final data set), modelling (selection, application, and tuning of the model), evaluation (assess model performance), and deployment (organise, present, and implement the model). The second level consists of generic tasks within each phase, which are intended to be complete and stable and thus apply to all possible data mining situations (e.g. generic task ‘determine business objectives’ in phase ‘business understanding’). The third level consists of specialised tasks that describe how actions in the generic tasks should be carried out in specific contexts (e.g. specialised task ‘build clustering model’ for generic task ‘build model’). The fourth and final level consists of process instances that represent the documentation of actions, decisions, and results during the project. All phases are intended to be iterative and intertwined, and the model was originally envisioned to be applied in a cyclic rather than a waterfall fashion (Wirth and Hipp 2000).