Explore chapters and articles related to this topic
Information Extraction
Published in John Atkinson-Abutridy, Text Analytics, 2022
Although such an approach could be used in restricted domains, generally more robust methods are required to recognize entities. One of these methods is known as Named-Entity Recognition (NER), which seeks to locate and classify named entities in texts, in predefined categories such as names of people, organizations, locations, amounts, time, monetary values, etc. NER is used in many NLP problems and can help answer many questions for several applications, such as: What companies were mentioned in news articles?What named entities can characterize a document for further search?Were certain products mentioned in the complaints or comments?Does the tweet contain a person’s name?
A Study and Comparative Analysis of Various Use Cases of NLP Using Sequential Transfer Learning Techniques
Published in R. Sujatha, S. L. Aarthy, R. Vettriselvan, Integrating Deep Learning Algorithms to Overcome Challenges in Big Data Analytics, 2021
R. Mangayarkarasi, C. Vanmathi, Rachit Jain, Priyansh Agarwal
Named Entity Recognition (NER) is a necessary part of NLP tasks, such as IR and IE. NER is used to find the entity-type of words in a given dataset. This section demonstrates the NER framework using STL’s two benchmarking models, ELMO and BERT. NER uses sequential transfer learning models, like BERT and Elmo, along with the basic models that use feature vectors and embeddings (e.g. LSTM, LSTM-CRF, Random Forest Classifier, etc.). ELMO is an approach in transfer learning that is learned both from front and back using LSTM architectures. To identify the relationships between entities, ELMO uses contextual learning, which is better than word Embeddings. Due to this, the ELMO learns the word meaning and the context in which the word is used. That is, instead of assigning the fixed embedding, the ELMO first looks in the sentence then assigns the word embedding to the word. The BERT model was first the unsupervised, pure bi-directional system used for the pre-training model of NLP tasks. BERT uses three types of embeddings for computing its input representations, including token embeddings, segment embeddings, and position embeddings. BERT is pre-trained with unlabeled data, which can be fine-tuned on labeled data to get the desired results. The advantage of BERT is that it was built on trained contextual representations and it is purely bi-directional. Whereas ELMO and ULMFIT are uni-directional and partially bi-directional. The pseudo code 2 and 3 gives the steps for detecting the NE using ELMO and BERT. 3.
Natural Language Processing
Published in Vishal Jain, Akash Tayal, Jaspreet Singh, Arun Solanki, Cognitive Computing Systems, 2021
V. Vishnuprabha, Lino Murali, Daleesha M. Viswanathan
Named entity recognition is the process of extracting named entities that are present in a text into pretagged categories such as “individuals,” “companies,” “places,” “organization,” “cities,” “dates,” “product terminologies,” etc. It enriches the semantic knowledge of the content and helps promptly understand the subject of any given text. It is useful in applications such as news content analysis, business sentiment analysis, etc. Named entity recognition can provide article scanning based on relevant tags to reveal the significant people, organizations, and places discussed in them. It helps in the automatic classification of articles with fast content discovery. In business sentiment analysis, extracting the identity of people, places, dates, companies, products, jobs, and titles gives an insight into the people's opinion on product and company.
Full-span named entity recognition with boundary regression
Published in Connection Science, 2023
Junhui Yu, Yanping Chen, Qinghua Zheng, Yuefei Wu, Ping Chen
A named entity is defined as a word or a phrase in a sentence that refers to an object in the world. From the perspective of natural language understanding, named entity is the most basic linguistic units of a sentence. Recognising them is the key to understanding a sentence. This task was first coined in the sixth Message Understanding Conference (MUC-6) as a subtask of information extraction (Grishman & Sundheim, 1996). As a fundamental task, it can support a wide range of applications, e.g. knowledge graph construction (Al-Moslmi et al., 2020), machine translation (Hu et al., 2022), sentence parsing (Yu et al., 2020), question answering (Longpre et al., 2021), and so forth. Furthermore, named entities comprise the main part of out-of-vocabulary words (or new words) which are usually noted as a considerable obstacle to automatically processing natural language. Therefore, techniques of named entity recognition also have important theoretical impacts and applications in natural language processing.
Towards Malay named entity recognition: an open-source dataset and a multi-task framework
Published in Connection Science, 2023
Yingwen Fu, Nankai Lin, Zhihe Yang, Shengyi Jiang
Named Entity Recognition (NER) plays an essential role in multiple downstream NLP applications such as information retrieval (Guo et al., 2009), question answering (Aliod et al., 2006), and knowledge graph construction (Etzioni et al., 2005). It is intended to identify the entity boundaries and classify them into predefined categories (such as person, location, organisation, etc.). With the widespread application of neural networks recently, various neural approaches such as bidirectional long short-term memory (Bi-LSTM) (Ma & Hovy, 2016), convolutional neural network (CNN) (Chiu & Nichols, 2016), or recently pre-trained language models (PLMs) (Akbik et al., 2019, 2018) for NER have been proposed. Looking at the current development of NER, there are two main challenges: Dependence on massive labelled data: neural NER models are highly successful for languages/domains with a large amount of labelled data. However, most low-resource languages (such as Malay, Tamil, etc.) do not have enough labelled data to train fully supervised models (Lin et al., 2020).Boundary recognition error (BRE): One of the significant elements influencing NER performance is BRE (Li et al., 2021). In our preliminary experiments, we found that the current NER models are insufficient in recognising the entity boundaries (especially for long entities). Therefore, a solution to the BRE problem is urgently needed.
RadScore: An Automated Technique to Measure Radicalness Score of Online Social Media Users
Published in Cybernetics and Systems, 2023
The data obtained from these sources is then cleaned and preprocessed to remove all stop words, URLs, and alpha-numeric characters. The cleaned data is then sent to the MeaningCloud text classification API (Text classification and categorization API — MeaningCloud n.d.) for extracting the domain of the text. The MeaningCloud API uses the IPTC ontology to determine the category/domain of text. The domains to be considered in our proposed approach are: Politics, Unrest, and Crime. Once the domain or category of text is extracted, a Named Entity Recognition (NER) is performed over the text to extract the persons of interest and the locations of interest from the text. The person and place of interest are extracted for each of the subdomain (politics, unrest, and crime). Thus, we create a total of six different glossaries with two subglossaries (person of interest and place of interest) each for the three subdomains (politics, unrest, crime). Rest of the content is categorized as miscellaneous. The glossaries created are represented in Figure 2 by means of a word cloud.