BERT – Knowledge and References

Explore chapters and articles related to this topic

High-Performance Computing and Its Requirements in Deep Learning

Published in Sanjay Saxena, Sudip Paul, High-Performance Medical Image Processing, 2022

Biswajit Jena, Gopal Krishna Nayak, Sanjay Saxena

Inception architecture is known to be one of the most creative architectures in computer vision as its applications range from image recognition to object detection and proved its impact in natural language processing applications, mainly in the design of transformer machines classification tasks, etc. Bert model is a known name in many natural language processing-related fields, which proved its importance mainly in tasks such as text processing, text cleaning, word embedding, machine translation, and other transformer-related applications. It is famous for its state of the art results and ease of use in various applications, and not to forget XL Net, an equally known natural language processing tool. Other important models are Alexnet, VGG16, Mobilenet, etc., which are computer vision-related pre-trained models but are also effective in interdisciplinary research. With this one, we can get the idea of what a pre-trained model is and how effective it can be when provided an opportunity for cross-platform knowledge. U-Net, Segnet and DeconvoNet are three important pre-trained models of image segmentation. U-Net was basically developed for biomedical image segmentation and hence good for all medical images. SegNet is an encoder-decoder-based segmentation technique that has its own advantages and is good for basically natural image segmentation. DeconvoNet follows the same architecture as SegNet however, there are fully connected layers which makes the model larger.

An Unexpected Renaissance Age

View Chapter

Purchase Book

Published in Alessio Plebe, Pietro Perconti, The Future of the Artificial Mind, 2021

Alessio Plebe, Pietro Perconti

Transformer proved to be the turning point, boosting DL for language, with a rapid progression in a few years. The BERT (Bidirectional Encoder Representations from Transformers) model by Devlin et al. (2019), as the name reveals, applies attention in the decoder section to both the left and the right side of the output sentence. In the original Transformer, the attention score is first computed for each word in the sentence with respect to all the other words, but then in the decoder the output words are generated one at time, and the attention computed by the encoder is used only with respect to previous words. But above all, BERT establishes a new general approach to most language application tasks. There is a basic model that grasps as much as possible knowledge about a language by training on huge text corpora, up to several billion words. One or a few more additional layers are then added on this model, and a second training, to fine tune the model for a custom language task, is performed. Almost every natural language processing task can be performed, with the advantage of a deeper and intimate understandings of how the language works. Tasks can include sentence classification, question answering systems, named entity recognition, automatic summarization, sentiment analysis, conversation, and machine translation. A similar approach is followed by OpenAI’s GPT (Generative Pre-trained Transformer) Brown et al. (2020), the last version of which, GPT-3, generated the article on The Guardian shown at the beginning of this section.

Classification of Text Data in Healthcare Systems – A Comparative Study

View Chapter

Purchase Book

Published in Om Prakash Jena, Bharat Bhushan, Nitin Rakesh, Parma Nand Astya, Yousef Farhaoui, Machine Learning and Deep Learning in Efficacy Improvement of Healthcare Systems, 2022

Ömer Köksal

BERT is a multilayer transformer encoder model. BERT is trained over a large corpus with both a masked language modeling task and a next sentence prediction task simultaneously [29]. It uses an attention mechanism to model long-range dependencies in the input text. The language modeling in BERT is bidirectional, as its name implies. This bidirectional model aims to capture the prediction process's contextual information. The next sentence prediction task samples the next sentence in the input with another sentence at random. It expects the model to predict whether the chosen sentence naturally follows the first sentence. This task enables BERT to develop a broader understanding of natural language by working at the sentence level. BERT brings a compelling basis for a variety of downstream tasks by its fine-tuning feature. The BERT fine-tuning enables to use of additional domain-specific data to adapt pre-trained BERT. For the classification task, the BERT embedding vector for the special “[CLS]” token placed at the beginning of the input sequence is used to encode the entire sequence. An additional fully connected layer maps the sequence embedding to the classification probabilities. BERT has several parameters for improving performance, such as the epoch size, maximum input sequence length, and pre-trained model choice. In the literature, it is shown that BERT outperformed its other pre-trained language models in diverse NLP tasks [29]. BERT can incorporate smaller, task-specific data in the more lightweight fine-tuning process. So it becomes widely adopted for high accuracy gains in a range of downstream tasks [30].

Identifying heart disease risk factors from electronic health records using an ensemble of deep learning method

View Article

Journal Information

Published in IISE Transactions on Healthcare Systems Engineering, 2023

Linkai Luo, Yue Wang, Daniel Y. Mo

EHRs contain clinical notes in an unstructured and free-form format. It is necessary to understand complicated language utterances and extract the informative and discriminative features relevant to heart disease. Deep neural networks contain multiple layers and have been effective in modeling high-level semantic information from text and other resources (Wang & Li, 2021; Wu et al., 2022; Zhao et al., 2022). Therefore, we will draw on deep learning in this phase to extract informative and discriminative features from an EHR corpus. Many deep-learning units have been developed to extract and quantify contextual information in text, such as RNN, LSTM, and the Transformer model (Vaswani et al., 2017). Recently, the Transformer-based BERT model has shown superior performance in almost all natural language processing tasks (Luo & Wang, 2019). BERT is a pre-trained language model using bidirectional training of Transformer, an attention-based neural network model (Devlin et al., 2019). The bidirectional training enables BERT to have a deeper sense of context in language. The BERT model consists of 12 layers of Transformers and has been proven much more effective for identifying abstractions or representations in the data than other deep-learning units. Thus, this paper will deploy BERT to learn the high-level representations of the domain information from the EHR corpus.

Multimodal Emotion Recognition Framework Using a Decision-Level Fusion and Feature-Level Fusion Approach

View Article

Journal Information

Published in IETE Journal of Research, 2023

C. Akalya devi, D. Karthika Renuka

In this section, text feature extraction using the transformer-based BERT model is discussed. Bidirectional Encoder Representations from Transformers (BERT) are a neural network-based pre-trained architecture model by Google which is meant for Natural Language Processing (NLP). BERT is pre-trained in speaker-independent corpus, such as Wikipedia which can recognize the context and hence choose to incorporate BERT for text emotion recognition. BERT is deeply bidirectional since the context is based on the entire set of words rather than the basic understanding from left to right and vice versa. Pre-trained Bidirectional Encoder Representations from Transformers are fine-tuned along an extended output layer for creating the state-of-the-art method for a broad variety of Natural Language Processing tasks. Bidirectional Encoder Representations from transformer architecture consist of multiple attention heads which process the text in both directions which helps to retrieve the context much better than a simple attention layer. The encoder network assigns scores for each of the group of words in a sequence, it easier for the decoder to calculate the meaning of the words. Input token series of 128 max length is produced by [CLS] and [SEP] tokens along with attention mask and is fed into the BERT text model. The classification layer is added at the end of core BERT-based method to downstream the emotion classes. The BERT-TERM method consists of 12 layers, 12 self-attention heads, a hidden layer size of 768 along a dropout value of 0.5 followed by a final layer of size 256.

BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model

View Article

Journal Information

Published in Applied Artificial Intelligence, 2022

Song Chen, Hai Liao

Raw log messages are unstructured, which contain many different format texts. It is hard to detect numerous anomalies based on unstructured logs. The purpose of log parsing (Du and Li 2016; He et al. 2017) is to structure logs to form group of event templates. HitAnomaly (Huang et al. 2020) is a semantic-based approach which utilizes a hierarchical transformer structure to model log templates and uses an attention mechanism as final classification model. BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model proposed by Devlin et al. (2018) of Google in 2018 which obtains new state-of-the-art results on eleven famous natural language processing (NLP) tasks. Compare with the previous hierarchical transformer, BERT contains pre-training and fine-tuning steps and has better performance on the large suite of sentence-level and token-level tasks. BERT has already been used in many fields (Do and Phan 2021; Peng, Xiao, and Yuan 2022). So it is better suited for handling semantic-based log sequences.