Synthetic data – Knowledge and References

Explore chapters and articles related to this topic

Deep Learning

Published in Subasish Das, Artificial Intelligence in Highway Safety, 2023

Due to typically having limited training data, feature extraction and data generation are significant applications in deep learning. Different techniques are employed to supplement the original dataset r and provide a bigger dataset to train the network. Through the use of advanced deep learning architectures (i.e., Autoencoders and Generative Adversarial Networks (GANs)), synthetic data from the original dataset can be generated in order to advance model learning. These architectures both belong to the Pretrained Unsupervised Network (PUN) family, which is a deep learning model that employs unsupervised learning in order to teach each hidden layer in a neural network to produce a better fit for the dataset. To individually train each layer, an unsupervised learning algorithm is independently employed, with the input being the previously trained layer, and a refinement step is conducted throughout the entire network after the pre-training on each layer is conducted with supervised learning. Examples of PUNs include Autoencoders, Generative Adversarial Networks (GAN), and Deep Belief Networks (DBN) (Kelleher, 2019).

Deep Learning

View Chapter

Purchase Book

Published in Seyedeh Leili Mirtaheri, Reza Shahbazian, Machine Learning Theory to Applications, 2022

Seyedeh Leili Mirtaheri, Reza Shahbazian

DL structures can be built based on greedy layer algorithms and can be utilized for many tasks such as supervised, unsupervised, and semi-supervised algorithms. For supervised learning problems, DL algorithms remove the engineering and human efforts for doing special tasks on the data by transforming the data to the hidden layers and extracting features. DL algorithms can also be utilized for unsupervised learning problems when the labeled data is not available, or the labeled data is not sufficient. For instance, autoencoders [62] and deep belief networks [63] can be used for unsupervised learning problems. On the other hand, there is a new deep learning-based algorithm, called generative adversarial networks, which can generate synthetic data based on a set of limited data samples. The distribution of generated data is similar to the real data, but they are not the same.

Overview of Deep Learning Algorithms Applied to Medical Images

View Chapter

Purchase Book

Published in Ayman El-Baz, Jasjit S. Suri, Big Data in Multimodal Medical Imaging, 2019

Behnaz Abdollahi, Ayman El-Baz, Hermann B. Frieboes

where P(Yi|xi, w) is the conditional probability of each output given the input and the parameters, and f^(xi) is the estimated mapping function from input to output using the designed model. Generative models such as naïve Bayes classifier assume some prior distribution for P(y) and P(X|Y) and use Bayes’ rule to calculate P(Y|X). In contrast, generative models learn the distribution of each class and require the modeling of the conditional probabilities. It is possible to use Bayes’ rule to change the definition of a discriminative model into a generative model (1,6). Generative models are also used for generating synthetic data to compensate for a low number of samples, with generative adversarial networks as an example of this approach.

Human-centric artificial intelligence architecture for industry 5.0 applications

View Article

Journal Information

Published in International Journal of Production Research, 2023

Jože M. Rožanec, Inna Novalija, Patrik Zajec, Klemen Kenda, Hooman Tavakoli Ghinani, Sungho Suh, Entso Veliou, Dimitrios Papamartzivanos, Thanassis Giannetsos, Sofia Anna Menesidou, Ruben Alonso, Nino Cauli, Antonello Meloni, Diego Reforgiato Recupero, Dimosthenis Kyriazis, Georgios Sofianidis, Spyros Theodoropoulos, Blaž Fortuna, Dunja Mladenić, John Soldatos

We discriminate between data obtained from real sources and synthetic data (created through some procedure) regarding the source of the data. Synthetic data is frequently used to enlarge the existing data or to generate instances that satisfy specific requirements when similar data is expensive to obtain. While many techniques and heuristics have been applied in the past to generate synthetic data, the use of Generative Adversarial Networks (GANs) has shown promising results and been intensely researched Zhu and Bento (2017), Mahapatra et al. (2018), Sinha, Ebrahimi, and Darrell (2019), Mayer and Timofte (2020). Strategies related to data selection are conditioned by how data is generated and served. If the data is stored, data instances can be scanned and compared, and some latency can be tolerated to make a decision. On the other hand, decisions must be made at low latency in a streaming setting, and the knowledge is constrained to previously seen instances. Data selection approaches must consider informativeness (quantifying the uncertainty associated to a given instance, or the expected model change), representativeness (number of samples similar to the target sample), or diversity criteria (selected samples scatter across the whole input space) (Wu 2018). Popular approaches for classification problems are the random sampling, query-by-committee (Seung, Opper, and Sompolinsky 1992), minimisation of the Fisher information ratio (Padmanabhan et al. 2014), or hinted sampling with Support Vector Machines (Li, Ferng, and Lin 2015).

A novel sEMG data augmentation based on WGAN-GP

View Article

Journal Information

Published in Computer Methods in Biomechanics and Biomedical Engineering, 2023

Fabrício Coelho, Milena F. Pinto, Aurélio G. Melo, Gabryel S. Ramos, André L. M. Marcato

A good option for generating synthetic data is deep learning-based methodologies. Known as GANs (Goodfellow et al. 2014) (Generative Adversarial Networks), this technique is a generative model that uses two neural networks to confront each other in the min-max game. In (Haradal et al. 2018), the authors applied GAN to expand two biosignals datasets: electrocardiogram (ECG) and electroencephalogram (EEG). The authors implemented the neural network based on LSTMs as a classifier to validate the results. For comparison, three methods of data augmentation were used: noise addition, signal interpolation, and Hidden Markov Model (HMM). The accuracy of the classifiers that were submitted to databases using GANs was superior to the other methods.

Automatic data collection for object detection and grasp-position estimation with mobile robots and invisible markers

View Article

Journal Information

Published in Advanced Robotics, 2023

Suraj Prakash Pattar, Thomas Killus, Tsubasa Hirakawa, Takayoshi Yamashita, Tetsuya Sawanobori, Hironobu Fujiyoshi

Synthetic-data-generation methods require highly accurate 3D models of the objects for high-fidelity data. The drawback of using synthetic data is that the models trained using only synthetic data do not perform well during inference when tested on real data [24]. Poorly constructed 3D models can lead to further increasing the Sim2Real gap. Synthetically generated data have been known to increase detection accuracy when combined with a sparse real dataset [25].