Data science – Knowledge and References

Explore chapters and articles related to this topic

Instigation and Development of Data Science

Published in Pallavi Vijay Chavan, Parikshit N Mahalle, Ramchandra Mangrulkar, Idongesit Williams, Data Science, 2022

Priyali Sakhare, Pallavi Vijay Chavan, Pournima Kulkarni, Ashwini Sarode

Figure 1.3 shows the diagram of data science. Data science is the discovery of knowledge through the analysis of data which is the statistics extension that is capable of dealing with huge amounts of data. In data science, the past data is being analyzed for prediction of future analysis. Data science usually works with dynamic unstructured data. The skills required for data science are statistics, visualization, and machine learning. Data science has several current viewpoints. They are as follows: Data science is about studying scientific and business data.Data science is an integration of computing technology, statistics, and artificial intelligence.The purpose of data science is to solve scientific as well as business problems by the extraction of knowledge from data [5,6].

Cutting Edge Data Analytical Tools

View Chapter

Purchase Book

Published in Chong Ho Alex Yu, Data Mining and Exploration, 2022

Chong Ho Alex Yu

The open source movement was started in the late 1970s and early 1980s by Richard Stallman, a computer programmer at MIT. In September 1983, Stallman created the GNU project (GNU stands for GNU’s Not Unix—a recursive acronym) with the goal of giving a UNIX-like operating system to the world for free. In 1997 software engineer Eric Raymond published his seminal essay, entitled “The Cathedral and the Bazaar.” The Cathedral model refers to the software development process in which code developed between releases is restricted to an exclusive group, whereas the Bazaar model refers to an open system in which the code is distributed across the Internet so that everyone can contribute to debugging. To actualize what he proclaims, in 1998 Raymond released the source code of Netscape, an early Web browser, to the public. In the same year the term “open source” was coined by Christine Peterson to describe this type of free software (Haff 2018). Python and the R Language are two major tools for data science.

Machine Learning and Data Science in Industries

View Chapter

Purchase Book

Published in Sandeep Misra, Chandana Roy, Anandarup Mukherjee, Introduction to Industrial Internet of Things and Industry 4.0, 2021

Sandeep Misra, Chandana Roy, Anandarup Mukherjee

Data science is a scientific approach to provide meaningful insight from data using algorithms, methods, and processes. In order to process, analyze, and store a huge volume of data generated from industrial processes, data science is indispensable. The raw data collected from various industrial processes, machines, and devices may be in structured, semi-structured, and unstructured form. The information extracted from the data collected helps AI to identify hidden patterns and derive meaningful insights from them. AI is the capability of machines to operate and act intelligently, bordering toward human intelligence [184]. Machine learning (ML) and deep learning are the two common tools of AI that assist in making industrial technologies smarter. ML is a subset of AI, which predicts or provides decisions based on collected data. Similarly, deep learning is a form of ML based on the concept of dense artificial neural networks (ANNs). The relationship between AI, ML, and deep learning is depicted in Fig. 13.1. Typically, AI can be categorized as strong and weak AI. Strong AI is designed to be context-aware, cognitive, and capable of making decisions on their own. In contrast, weak AI is mostly dependent on algorithms and programmatic responses. For example, Alexa is a virtual assistant, developed by Amazon for voice interactions, setting alarms, music playback, and providing real-time information. However, Alexa is based on weak AI and processes data using a request-response behavior.

Big Data Challenges in Social Sciences: An NLP Analysis

View Article

Journal Information

Published in Journal of Computer Information Systems, 2023

Moti Zwilling

While data science is defined as a field that involves working with a large volume of data for the purpose of building perspective and predictive models, big data is defined as a collection of “huge” data in various repositories.9 It is estimated that almost 2.5 quintillion bytes of data are created every day32; hence, over time, employees and managers will need to acquire tools and skills to work with them. This is well supported in the study by Donoghue et al.10 [301, p.528], who succinctly stated the challenges in this domain as follows: “ … . While Nolan and Temple Lang accurately predicted the need for those who analyze data to be statistically sound, computationally competent, and data literate, we propose that data science education must go further to directly address the many ways in which data and its analysis have completely reshaped society and our daily lives” (emphasis added). Likewise, Uppal and Bohon33 made special mention of the need to develop data science skills within departmental structures inside universities. The authors “focused” on students in social science disciplines and suggested providing them with the “right” tools and skills to cope with the real-world impact of large-scale data analysis.

How can polydispersity information be integrated in the QSPR modeling of mechanical properties?

View Article

Journal Information

Published in Science and Technology of Advanced Materials: Methods, 2022

F. Cravero, S. A. Schustik, M. J. Martínez, M. F. Díaz, I. Ponzoni

Data science, especially data mining, develops methods and tools to extract meaningful information and patterns from data. The capability to predict a property value, with a certain degree of accuracy, is one of the most widely used applications of data mining in Chemistry. Emulating chemists in the well-known computer-aided drug design, materials researchers have used data mining for selecting materials with a desired property. Each material is described by features (molecular descriptors) that represent aspects of its chemical structure; and the target property is measured in a real material (after synthesis). Measuring this target requires the synthesis and characterization of materials; consequently, it is the most expensive and time-consuming step. Thus, in silico property prediction techniques have been proposed to use these predictions as a virtual screen to synthesize the minimum possible compounds (see Figure 1).

Analysis of building maintenance requests using a text mining approach: building services evaluation

View Article

Journal Information

Published in Building Research & Information, 2020

Rafaela Bortolini, Núria Forcada

To perform a text mining exercise, skills on programming language would be required. R and Python are the most popular programming language for data science. Basic knowledge of statistics is also necessary to identify questions, and obtain more accurate conclusions. Of course, this represents a major cultural shift in most organizations. Traditionally, decisions are made spontaneously and without a lot of analytical forethought. Interposing a structured decision-making process, with associated decision support tools into this process will represent a major change in thinking and in culture in many organizations. It will not be easy to overcome these challenges, but the technology is becoming increasingly more prominent in maintenance management. Text mining approaches would be a growing area of interest, and represent a huge potential opportunity for benefit those organizations who are managing a portfolio of buildings.