Data engineering – Knowledge and References

Explore chapters and articles related to this topic

Modern Data Analytics for the Military Operational Researcher

Published in Natalie M. Scala, James P. Howard, Handbook of Military and Defense Operations Research, 2020

Data engineering is the function that builds the data and analytical infrastructure (i.e., the databases), loads the collected data into that infrastructure, and applies metrics associated with the dimensions of data quality to ensure the data are correct in content and form. Data engineers are the 21st century database developers and experts. Clearly, the data engineering needs of today are much more complex, while the data engineering role in analytics is steadily growing more important. Data engineering is as important to analytics as software engineers are to application development.

Unraveling Data Science, Artificial Intelligence, and Autonomy

View Chapter

Purchase Book

Published in Jay Liebowitz, Data Analytics and AI, 2020

John Piorkowski

Leveraging the advances in data storage, computing, and machine learning, data science emerged as a popular field. Practical applications of data science are broad reaching, to include marketing, fraud detection, logistics, crime prediction, social engagement, sports team management, and health care.* For any data science application, you can consider the data science maturity model illustrated in Figure 1.9. The maturity model is adapted from the analytics value chain presented in Anderson (2015). The initial (and often the most resource intensive) step to mature a data-driven approach is the data engineering. Data engineering can also be described as data wrangling, curation, or extract, transform, load (ETL). As previously mentioned, the reduced cost of memory has led to creation of enormous amounts of data. However, the data are often contained in disparate systems and are not well-suited for modern data science algorithms. In the discussion of machine learning, the data must be engineered to allow feature representations. Neil Lawrence offers a framework to assess data readiness for analytics (Lawrence, 2017). He uses three levels of data readiness (Figure 1.10). The lowest level (C-Level) describes the challenges with data engineering and wrangling. As Lawrence explains, many organizations claim they have data, but the data have not been made available for analytic use. He refers to this type of data as “hearsay” data. B-Level data require an understanding of the faithfulness and representation of the data. Finally, A-Level data are about data in context. With A-Level data, it is understood whether the data can answer organizational questions. Once data are made available in a data warehouse or data lake, reporting can be performed. Many organizations create reports using spreadsheets or text documents. This approach looks backward, reflecting what has happened in the past. The promise of data science is to move beyond backward-looking reporting to become forward looking. In the field of data science, analytics are typically described as descriptive, predictive, and prescriptive, although some include diagnostic after descriptive analytics. Descriptive analytics involves understanding the characteristics of the data. For numerical data, descriptive analytics includes statistical measures such as means, standard deviations, modes, and medians. Other analytics may include histograms. Descriptive analytics helps to discover anomalous and missing data examples. Descriptive analytics are backward looking too.

Realising the promises of artificial intelligence in manufacturing by enhancing CRISP-DM

View Article

Journal Information

Published in Production Planning & Control, 2023

Jon Bokrantz, Mukund Subramaniyan, Anders Skoogh

All three competences must be involved throughout the CRISP-DM process. However, the degree of involvement among the competences varies across the different phases, resulting in that the competences interact differently with each other over time. In Figure 5, we schematically illustrate the trajectories of involvement for the domain-, data science-, and data engineering competence. The involvement of domain competence follows a u-shaped curve; starting at a high level, reducing in the middle, and returning to high levels. This reflects how domain competence is central for identifying and framing the problem as well as interpreting and taking action on AI insights during operations. The involvement of data science competence follows a logistic growth curve; starting at a low level, increasing rapidly, and maintaining high throughout the process. This reflects how data science competence is central to the core design and development of the AI solution and making sure it consistently works as intended during operations. The involvement of data engineering follows a bimodal curve that peaks twice during the process. This reflects how data engineering comes into place when creating the data pipelines for the AI solution and facilitating its deployment and implementation in operational settings.

The mediating role of knowledge management and information systems selection management capability on Big Data Analytics quality and firm performance

View Article

Journal Information

Published in Journal of Decision Systems, 2022

Oluseyi Peter Obitade

Table 2 presents the descriptive statistics of the study sample. The sample firms were distributed across a wide range of industry sectors. Based on the number of employees, the sample firms are a good representation of the diversity of firms from small to large organisations. Their IT departments, on average, has 12 employees. The average budget of the sample was about 1.2 percent of sales revenue. The IS executives had, on average, 26 years of professional experience. Also, over 64 percent of the responding business executives with job titles such as vice president, president, directors, and senior manager. On average, they had been involved in strategic planning development for 11.4 years. Similarly, about 82 percent of the IS participants were Vice president of IT, IT Director, Data Science Manager, Senior Data Scientists, Project Manager, Data Engineering Lead. Taken together, these attributes indicate that the respondents are high ranking within their organisation and were highly competent to answer the questions of this study.

Introduction to special edition of Quality Engineering

View Article

Journal Information

Published in Quality Engineering, 2022

Roger W. Hoerl, Ronald D. Snee

Before discussing the individual articles, we should provide a brief introduction to SE for those who may not be familiar with this discipline. First of all, in the phrase “statistical engineering,” “engineering” is the noun, that is, the “what.” “Statistical” is the adjective, which modifies that noun, i.e., explains the type of engineering. In other words, SE is a form of engineering, one in which statistics plays a heavy role. Of course, nouns and adjectives cannot be reversed without changing the meaning of the phrase. Therefore, just as “data science” is not the same thing as “scientific data,” “engineering statistics” is not the same thing as SE. Engineering statistics refers to the application of statistics to engineering problems, while SE refers to the engineering of solutions to complex statistical problems, which might be in healthcare, finance, education, or any other application area.