Explore chapters and articles related to this topic
Artificial Intelligence for Biomedical Informatics
Published in Ranjeet Kumar Rout, Saiyed Umer, Sabha Sheikh, Amrit Lal Sangal, Artificial Intelligence Technologies for Computational Biology, 2023
Shahid Azim, Samridhi Dev, Sushil Kumar, Aditi Sharan
“NER consists of three different problems – the recognition of a named entity in text, the assignment of a class to this entity (gene, protein, drug, etc.), and the selection of a preferred term for naming the object in case that synonyms exist” [17]. Earlier traditional NER models were based on sequence labeling, such models are Hidden Markov Models (HMM) and Conditional Random Fields (CRF). Performing handcrafted feature extraction is a difficult and time-consuming task in natural language processing, many pieces of research show deep learning models do automated feature extraction which is better than the handcrafted feature extraction technique and less time-consuming, so if features extracted by the deep learning models are applied to traditional sequence labeling models it may show considerable good performance. Cho & Lee [4] proposed “contextual long short-term memory networks with CRF (CLSTM)” model for named entity recognition. They are incorporated n-gram with the BI-LSTM and CRF. BI-LSTM with CRF encoding performed better in Named Entity Recognition than the RNN model [2]. To handle the out of vocabulary problem character embedding can be incorporated with the deep learning models. And to capture the relationship between entities CRF encoding layer can be added at the output layer. Character embedding showed improvement in text mining in natural language processing and can also handle the inconsistency of words [1].
Extraction of Medical Entities Using a Matrix-Based Pattern-Matching Method
Published in Himansu Das, Jitendra Kumar Rout, Suresh Chandra Moharana, Nilanjan Dey, Applied Intelligent Decision Making in Machine Learning, 2020
In previous works of clinical entity identification, different methods have been used such as the dictionary look-up method, and rules-based and machine learning. The dictionary look-up method used in [3], in which the authors identified clinical entities using dictionaries compiled from the corpus, performed experimentation on I2b2 2010 dataset and obtained the average F score of 48% for the Beth dataset and 50% for the Partners dataset. The rule-based method is also used in [13, 14], in which some rules are created based on corpus words and word occurrences, and words are then found in the corpus and mapped to a corresponding category and provided a 42% F score. Machine-learning-based approaches like SVM (support vector machine) and CRF (conditional random field) have been used for entity boundary identification and entity classification [1, 15], and which is based on the beginning, inner, and outside (BOI) model for sequence labeling. An unsupervised approach has also been used to extract named entities from biomedical texts [16], in which authors have developed a noun phrase chunker followed by a filter based on inverse document frequency. The classification of multiword entities is carried out by using the concept of distributional semantics.
Entity and relation collaborative extraction approach based on multi-head attention and gated mechanism
Published in Connection Science, 2022
Wei Zhao, Shan Zhao, Shuhui Chen, Tien-Hsiung Weng, WenJie Kang
The information extraction problem can be regarded as a sequence labelling problem, which will generate label space information (label information for short). Sequence labelling aims to give a label to each element in the sequence. In general, in NLP, a sequence refers to a sentence, and an element refers to a word in the sentence. Named Entity Recognition (NER) is a subtask of information extraction, which needs to locate and classify elements. For NER, its label information includes the locations and types of elements. In this paper, the BIO joint tagging method is used to tag each element with “B-X”, “I-X”, or “O”. Where “B-X” indicates the beginning of the element of type X, “I-X” indicates the middle position of the element of type X, and “O”indicates that the element does not have a type. As the entity “Richard Celeste” shown in Figure 1, “Richard” is labelled as “B-Peop” since it is the first element of the entity with the type name. Then “Celeste” is labelled as “I-Peop”. Since “Celeste” is followed by a word labelled “O”, it can be inferred that “Celeste” is the end boundary of this entity.
Deep Learning-Based Named Entity Recognition System Using Hybrid Embedding
Published in Cybernetics and Systems, 2022
Archana Goyal, Vishal Gupta, Manish Kumar
Several NER systems have been developed by different researchers. These systems include rule-based systems (Riaz 2010; Gupta and Lehal 2011; Singh, Goyal, and Lehal 2012; Eftimov, Koroušić Seljak, and Korošec 2017), machine learning-based systems (Saha, Mitra, and Sarkar 2012; Freire, Borbinha, and Calado 2012; Kanya and Ravi 2013; Ravikumar and Kumar 2021), hybrid systems (Saha and Ekbal 2013; Li, Fan, and Huang 2013; Ji et al. 2019; Thomas and Sangeetha 2019). Earlier systems have used rule-based techniques that work well with several handcrafted rules as well as dictionaries and manually created gazetteers. These systems are highly accurate and domain and language-specific. But these systems cannot be used for other languages as well as they require great expertise in the language which makes the system expensive. Traditional supervised learning algorithms such as support vector machine (Joachims 1999), maximum entropy (McCallum, Freitag, and Pereira 2000), conditional random field (Lafferty, McCallum, and Pereira 2001), etc., have overcome the difficulties posed by rule-based techniques. Conditional random field (CRF) has proved to be the best for sequence labeling tasks such as named entity recognition out of traditional machine learning algorithms (Goyal, Gupta, and Kumar 2019). Several hybrid systems (Saha and Ekbal 2013; Thomas and Sangeetha 2019) have also been presented by different authors which cover the benefits of both rule-based and machine learning-based approaches. Hybrid techniques overcome the issues of rule-based methods and machine learning methods by combining the properties and benefits of both. As reviewed in the literature, two or more machine learning-based approaches are also been hybridized to increase the predictability of the NER model.
ALSEE: a framework for attribute-level sentiment element extraction towards product reviews
Published in Connection Science, 2022
Hanqing Xu, Shunxiang Zhang, Guangli Zhu, Haiyang Zhu
In this paper, the task of extracting OT and OW is treated as a problem of sequence labelling, which means that OT and OC in product reviews are annotated with some labels. Common models of sequence labelling include Conditional Random Field (CRF) (Shao et al., 2021 2]) and Hidden Markov Model (HMM). In this paper, the linear-chain CRF model proposed by (Lafferty et al., 2001) is chosen for sentiment element extraction, which is an undirected probability graph model (as illustrated in Figure 2). It has the advantages of expressing long distance dependency and overlapping characteristics.