Information extraction – Knowledge and References

Explore chapters and articles related to this topic

Natural Language Processing for Information Retrieval

Published in Anuradha D. Thakare, Shilpa Laddha, Ambika Pawar, Hybrid Intelligent Systems for Information Retrieval, 2023

Anuradha D. Thakare, Shilpa Laddha, Ambika Pawar

Information retrieval (IR) processes take unstructured text (natural language text) as input and generate structured text concerning specific criteria based on the target application. Information extraction builds the foundation for many high-level NLP tasks such as translation, event extraction, question answering systems (QASs), and so on. There are various sub-tasks involved in the process of information extraction, including named entity recognition, linking between named entities, and extracting relations between entities. NLP offers numerous low-level tasks that include part-of-speech tagging and parsing, which make up the building blocks of the complex information extraction tasks. Thus, these low-level yet crucial NLP processes contribute to the success of IRS that allow us to manipulate natural text automatically.

Understanding Distributed Semantic Analysis with Spark Data Frames

View Chapter

Purchase Book

Published in Nedunchezhian Raju, M. Rajalakshmi, Dinesh Goyal, S. Balamurugan, Ahmed A. Elngar, Bright Keswani, Empowering Artificial Intelligence Through Machine Learning, 2022

Richa Mathur, Devesh K. Bandil, Dhanesh Kumar Solanki

Big data is a wide term for any voluminous and compound datasets that can provide useful information after processing. The information collected is not in suitable form for analysis. We require an information extraction process, that is, able to find out the required information from unstructured or raw data and put it in structured format. Data cleaning assumes predefined constraints to check valid data or error models for some domains. Data analysis helps to understand relationship among objects, develop methods for data mining to predict future observations accurately, enabling interactive response time but has lack of coordination between databases systems that is the current problem with big data. Better basic leadership and vital planning in business should be possible by examining big data. Enormous data examination is the way toward dissecting data collections and concentrate results from it to increase better bits of insights and opportunities. Exponential development of data in all fields requires total measures for getting to and overseeing such sort of information. Enormous data are for the most part connected with distributed computing, as investigation of huge datasets requires Hadoop like platform to store huge datasets over a distributed cluster and MapReduce to arrange, consolidate, and process information from numerous sources.

Artificial Intelligence for Biomedical Informatics

View Chapter

Purchase Book

Published in Ranjeet Kumar Rout, Saiyed Umer, Sabha Sheikh, Amrit Lal Sangal, Artificial Intelligence Technologies for Computational Biology, 2023

Shahid Azim, Samridhi Dev, Sushil Kumar, Aditi Sharan

Data Curation / Information Extraction Curation involves the annotation of data. The curation process can be done in either manual or semi-automated ways where initially sentences from the text are extracted in an automated manner. Information extraction is a process of finding essential and relevant entities and their semantic properties from the sentences of a given corpus. Entity recognition, relation extraction and coreference resolution are the three subtasks. Entity recognition entails determining the most appropriate entity type label for a given entity. Relation extraction entails determining the best relation type label between two entities, while coreference resolution entails grouping spans that refer to the same entity.

Uncertainty modeling and applications for operating data-driven inverse design

View Article

Journal Information

Published in Journal of Engineering Design, 2023

Shijiang Li, Liang Hou, Zebo Chen, Shaojie Wang, Xiangjian Bu

The inverse relationship model is one of the key issues in the design process. Growing sources of product data also increase data diversity and complexity. For different types of data, different analysis and processing methods are required to construct the inverse relationship model and mine the system parameters that are close to the real situation . With the maturation of technologies such as big data and artificial intelligence, technical data mining methods have emerged. These include information extraction processes that cover techniques such as statistics, machine learning, pattern recognition, and support vector machines (Chen et al. 2015). Three main factors are considered in these processes: the purpose of data mining, the data characteristics, and the algorithms used (Tsai et al. 2014).

Entity and relation collaborative extraction approach based on multi-head attention and gated mechanism

View Article

Journal Information

Published in Connection Science, 2022

Wei Zhao, Shan Zhao, Shuhui Chen, Tien-Hsiung Weng, WenJie Kang

The information extraction problem can be regarded as a sequence labelling problem, which will generate label space information (label information for short). Sequence labelling aims to give a label to each element in the sequence. In general, in NLP, a sequence refers to a sentence, and an element refers to a word in the sentence. Named Entity Recognition (NER) is a subtask of information extraction, which needs to locate and classify elements. For NER, its label information includes the locations and types of elements. In this paper, the BIO joint tagging method is used to tag each element with “B-X”, “I-X”, or “O”. Where “B-X” indicates the beginning of the element of type X, “I-X” indicates the middle position of the element of type X, and “O”indicates that the element does not have a type. As the entity “Richard Celeste” shown in Figure 1, “Richard” is labelled as “B-Peop” since it is the first element of the entity with the type name. Then “Celeste” is labelled as “I-Peop”. Since “Celeste” is followed by a word labelled “O”, it can be inferred that “Celeste” is the end boundary of this entity.

Evaluating risk propagation in renewable energy incidents using ontology-based Bayesian networks extracted from news reports

View Article

Journal Information

Published in International Journal of Green Energy, 2022

Qiqing Wang, Cunbin Li

In the era of big data, the rapid development of social media and the Internet provides infinite possibilities for the application of computer technology to incident management. Large-scale textual sources like tweets, blogs, news articles have been utilized to evaluate the severity and situation of incident risks. For example, incident reports of patients are used to train classification models (Evans et al. 2020), aviation safety reports are used to predict abnormal aviation events (Zhang and Mahadevan 2019), traffic events can be detected from millions of tweets instantly by text processing (Alomari, Katib, and Mehmood 2020). Among all the text mining techniques, information extraction (IE) in the domain of NLP has been widely adopted for automatically extracting critical information like causes, locations, events, and other named entities from unstructured texts. Underlying trends of in-flight events and categorical metadata parameters are extracted from text-based flight safety data by NLP tools (Rose, Puranik, and Mavris 2020). The safety-related information in railway call reports is semi-automatically classified with NLP techniques, bow-tie diagrams can provide structured reporting of hazards according to the extracted information (Hughes et al. 2018). Using the combination of several NLP systems, drugs, their attributes, and other advertised drug events are extracted from the clinical notes of patients (Dandala et al. 2020).