Explore chapters and articles related to this topic
Introduction
Published in S. Poonkuntran, Rajesh Kumar Dhanraj, Balamurugan Balusamy, Object Detection with Deep Learning Models, 2023
A dataset is a collection of data and its related values. The dataset has both the parameters as time and subject. The dataset creation is a challenging task in deep learning. The data collection is a static process. The collection of data is over a period of time; labeling the data, training the model and results are found in deep learning. There are different types of datasets such as text data, image data, signal data, sound data, physical data, anomaly data, biological data, multivariate data, question-answering data and other data repositories.
Crowd Estimation in Trains by Using Machine Vision
Published in Roshani Raut, Salah-ddine Krit, Prasenjit Chatterjee, Machine Vision for Industry 4.0, 2022
Machines cannot understand free written words, images or videos as it is. It won't be enough if we just present a static set of images and expect our machine learning system to be automatically trained by that. Data sets are basically a collection of data objects.
Outliers Detection and Its Deployment in SAS and R
Published in Tanya Kolosova, Samuel Berestizhevsky, Supervised Machine Learning, 2020
Tanya Kolosova, Samuel Berestizhevsky
Machine learning process, regardless of the method used, depends heavily on the so-called input datasets. It is the most crucial aspect that makes algorithm training and testing possible. A dataset is a collection of data. In other words, a dataset corresponds to the contents of a single data table, or a single statistical data matrix X(n×p) where each column j=1,…,p of the table represents a particular variable (feature) and each row i=1,…,n corresponds to a specific observation of the dataset.
Intelligent and novel multi-type cancer prediction model using optimized ensemble learning
Published in Computer Methods in Biomechanics and Biomedical Engineering, 2022
The input data related to multi-type cancer is given to the data cleaning process for enhancing the accurate performance of the proposed model. Data cleaning is ‘the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset’. At the time of combining the diverse data sources, there is a probability of occurring duplicate data or mislabeled data. When the data is known to be incorrect, the algorithms and outcomes may become unreliable and also may also appears to be correct. Thus, the data cleaning is highly essential for providing the superior data to the feature extraction phase. The cleaned data is represented by
Hybrid MQTTNet: An Intrusion Detection System Using Heuristic-Based Optimal Feature Integration and Hybrid Fuzzy with 1DCNN
Published in Cybernetics and Systems, 2022
The collected data from MQTT datasets is distributed as the input to the pre-processing phase, where the data cleaning is performed to get the clean data for further processing. Data cleaning is “the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.” When the combination of various data occurs, it may cause the possibility of making the duplicate data or sometimes, mislabeled data may occur. If the incorrect data have been utilized in the model, it will change the effectiveness of algorithms utilized in the model by misleading them to generate inappropriate results. Data normalization is performed with the cleaned data, where the noisy areas in the input data are removed through the normalization strategy within the particular interval. Hence, the pre-processing with data cleaning and data normalization are performed over the collected data and the pre-processed data is represented by which is utilized in the feature extraction model.
The “humane in the loop”: Inclusive research design and policy approaches to foster capacity building assistive technologies in the COVID-19 era
Published in Assistive Technology, 2022
John Bricout, Julienne Greer, Noelle Fields, Ling Xu, Priscila Tamplain, Kris Doelling, Bonita Sharma
How the intelligent agents are trained has strong implications for both ethical and technological considerations of AI and robot performance, especially around transparency. Transparency is key to both ensuring ethical safeguards and holding machine performance accountable for any harms. Machine learning, which produces algorithms with minimal human input, and even more so, deep learning, for which the algorithms truly constitute a black box, begs the question of how the machine/robot learning is instantiated through the training data set and model. Ethical issues arise around where those data are from, how they were collected, what they were meant to address, and what is missing from the data set, in addition to any biases: systematic, random, known, or unknown, that might lurk. Being able to answer at least some of these questions, while not wholly raising the veil from the black box, will at least provide some measure of transparency. In order to make the algorithms more understandable to humans requires scripted or automated planning approaches that exert more human control and specification, and are thus more readily traceable and comprehensible.