Semi-supervised learning – Knowledge and References

Explore chapters and articles related to this topic

A Brief History of Artificial Intelligence

Published in Ron Fulbright, Democratization of Expertise, 2020

In general, machine learning algorithms must be trained. For most of AI’s history, training has been supervised meaning the training set was carefully engineered by humans to contain both positive and negative labeled examples of the thing the humans wanted the machine to learn. Engineering of the training set requires significant, and sometimes overwhelming, amount of time and effort. In semi-supervised learning labeled training data is used but unlabeled data is also used. This reduces the amount of human engineering required and also opens machine learning up to unstructured data such as text messages, images, sounds, videos, etc. Unsupervised learning uses unlabeled data, so requires no human engineering. Unsupervised machine learning is free to identify any patterns in the data sometimes leading to unexpected discoveries.

Machine Learning Classifiers

View Chapter

Purchase Book

Published in Rashmi Agrawal, Marcin Paprzycki, Neha Gupta, Big Data, IoT, and Machine Learning, 2020

Rachna Behl, Indu Kashyap

Supervised Learning works on labeled data, whereas unsupervised technique works on unlabeled data. Practically, getting labeled data is a cumbersome and time-consuming task; we need experts who perform labeling manually, whereas non-labeled data is easily obtained. Semi-supervised learning is a type of learning in which the model is trained in a combination of labeled and unlabeled data. Typically, there is a large amount of unlabeled data compared to labeled. Familiar semi-supervised learning methods are generative models, semi-supervised support vector machines, graph Laplacian based methods, co-training, and multiview learning. These methods form different assumptions on the association between unlabeled data distribution and the classification function. Some of the applications based on this learning approach are speech analysis, protein sequence classification and internet content classifications (Stamp 2017). Recently Google has launched a semi-supervised learning tool called Google Expander.

Feature Extraction and Classification for Environmental Remote Sensing

View Chapter

Purchase Book

Published in Ni-Bin Chang, Kaixu Bai, Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing, 2018

Ni-Bin Chang, Kaixu Bai

By taking advantage of combined information from labeled and unlabeled data, semi-supervised learning attempts to surpass the performance that could be obtained from either supervised learning or unsupervised learning on each individual data set. In order to make use of unlabeled data, the structure of the input data should be limited to one of the following assumptions (Chapelle et al., 2006): Continuity assumption: Data close to each other are more likely to be labeled in the same class. This is generally assumed in supervised learning and should also be obeyed in the case of semi-supervised learning. This assumption yields a preference for geometrically simple decision boundaries even in low-density regions to guarantee that fewer points in different classes are close to each other.Cluster assumption: Discrete clusters can be formed and data in the same cluster tend to be labeled in the same class. This is a special case of the continuity assumption that gives rise to clustering-based feature learning.Manifold assumption: The data tend to lie on a manifold of much lower dimension than that of the input data. Learning can proceed using distances and densities defined on the manifold with both the labeled and unlabeled data to avoid dimensionality issues. The manifold assumption is practical, especially for high-dimensional data with a few degrees of freedom that are hard to model directly.

Current applications and future impact of machine learning in emerging contaminants: A review

View Article

Journal Information

Published in Critical Reviews in Environmental Science and Technology, 2023

Lang Lei, Ruirui Pang, Zhibang Han, Dong Wu, Bing Xie, Yinglong Su

Compared to deep learning, classical ML cannot perform representation learning and relies on well-defined features. ML algorithms are typically categorized as supervised (with labeled training data), unsupervised (with unlabeled training data), and semi-supervised (with mostly unlabeled training data), depending on their intended tasks. Among the most commonly used methods, supervised learning algorithms include classification and regression, which respectively predict non-numerical and numerical results. Supervised learning leverages labeled data set to accurately predict classified data or expected outcomes, and has been successfully employed to predict pollution caused by particulate matter (PM 2.5) and screen EDCs (Hu et al., 2017; Zorn et al., 2020). The unsupervised learning mainly encompasses clustering, association, dimensionality reduction, and feature extraction, and has been applied to detect anomalies in sewage treatment plant operations (Dairi et al., 2019). Semi-supervised learning can leverage both labeled and unlabeled data to accomplish supervised learning tasks when labeled data is scarce or costly (Zhu & Goldberg, 2009).

A Multi-View SVM Approach for Seizure Detection from Single Channel EEG Signals

View Article

Journal Information

Published in IETE Journal of Research, 2023

Gopal Chandra Jana, Mogullapally Sai Praneeth, Anupam Agrawal

Our Contributions: In this study, we propose a multi view SVM model to utilize information from two views of the dataset for seizure detection. In multi view learning, a ML model is able to learn features from multiple views of the same dataset. Multi view learning algorithms can be categorized based on: (1) Co-training, (2) Co-regularization, (3) Margin Consistency techniques [13]. Co-training is a type of semi-supervised learning algorithm in which two classifiers are trained separately on two views of the dataset. It uses features of labeled and unlabeled data, later incrementally builds the two classifiers over the two views. Co-regularization technique is adding an additional regularization term to the main cost function in order to make sure that data from different views are consistent as well make sure that the predictions from different views are close to each other. In margin consistency techniques, margin variables from different views of model to be consistent based on the product of output variables to be greater than every margin variable. In this paper, we have used a modified co-regularization technique to build SVM-2K [14]. Two views of the dataset were created in time and frequency domain using independent component analysis (ICA) and power spectral densities (PSD), respectively. Finally, extracted time and frequency domain features have been feed into proposed Multi-view SVM. Performance of the proposed model has been compared with single view SVMs (time and frequency domain feature individually) as well as with other relevant existing SVM based state of the art seizure detection models.

ALSEE: a framework for attribute-level sentiment element extraction towards product reviews

View Article

Journal Information

Published in Connection Science, 2022

Hanqing Xu, Shunxiang Zhang, Guangli Zhu, Haiyang Zhu

Due to the information fragmentation and semantic sparseness of product reviews, traditional methods are difficult to capture the local information of texts comprehensively. Hence, we proposed a framework for Attribute-Level Sentiment Element Extraction (ALSEE) towards product reviews. To sum up, the major contributions can be concluded as the following three aspects. Multiple features are considered to realise the feature fusion. The fusion of multiple features can help the model to capture local information and long distance dependencies in texts more comprehensively, and improve the recall of the extraction of OT and OW.The self-training algorithm was applied to realise the semi-supervised learning of the model. The introduction of self-training algorithm reduced the amount of data annotation and improved the classification accuracy and generalisation performance of the model.The dependency parsing technique is used to find the word pairs with modifying relationship. It considers the syntactic structure information and can better analyse the dependency relationship between words.