Information retrieval – Knowledge and References

Explore chapters and articles related to this topic

Natural Language Processing for Information Retrieval

Published in Anuradha D. Thakare, Shilpa Laddha, Ambika Pawar, Hybrid Intelligent Systems for Information Retrieval, 2023

Anuradha D. Thakare, Shilpa Laddha, Ambika Pawar

Stemming is an essential initial step in text data preprocessing, a necessary step in information retrieval, text data mining, and NLP. The purpose of stemming is to find a base/root word for standardization. If we remove suffixes from the word, it may cause errors like under-stemming and over-stemming. The input to the stemmer is tokens/words, and the output is root/base word. Stemming is also called removing affixes from words to get the original/root/base words. For example, “Singing” is a word if we remove its suffix, i.e., “ing,” then we get the original word “sing.” Further suffixes can be used to create new words from actual observations. Stemming is widely used in DM IR and NLP to reduce various word forms such as noun, adjective, verb, adverb, and so on to their basic/root word, reducing index file size. The stemming process has a substantial impact on the retrieval results for both rule-based and statistical approaches. Various stemming algorithms are available for multiple languages. Most stemming algorithms are based on a rule-based approach. These stem-mers outperform other popular techniques such as brute force.

Analysis of a Machine Learning Algorithm to Predict Wine Quality

View Chapter

Purchase Book

Published in Roshani Raut, Salah-ddine Krit, Prasenjit Chatterjee, Machine Vision for Industry 4.0, 2022

Nilesh Bhikaji Korade

The performance of the classification models for a given set of test data is drawn by using confusion matrix. It can only be determined if the true values for test data are known. In information retrieval and classification in machine learning, precision is also called positive predictive value that is the fraction of relevant instances among the retrieved instances, while recall is also known as sensitivity that is the fraction of relevant instances that were retrieved. Both precision and recall are therefore based on relevance. In statistical hypothesis testing, a type-I error is the rejection of a true null hypothesis also known as a “false-positive” (FP) finding or conclusion; for example, an innocent person is convicted, while a type-II error is the non-rejection of a false null hypothesis also known as a “false-negative” (FN) finding or conclusion; for example, a guilty person is not convicted. The different terms used are described next:

Netflow Feature Evaluation for the Detection of Slow Read HTTP Attacks

View Chapter

Purchase Book

Published in Stuart H Rubin, Lydia Bouzar-Benlabiod, Reuse in Intelligent Systems, 2020

Cliff Kemp, Chad Calvert, Taghi M Khoshgoftaar

Precision-Recall is a useful measure of success of prediction when the classes are very imbalanced. In information retrieval, precision is a measure of result relevancy, while recall is a measure of how many truly relevant results are returned. The F-measure (F-score), which is a measure of a test’s accuracy, is defined as the weighted harmonic mean of the precision and recall of the test and conveys the balance between the precision and the recall. An F-score reaches its best value at 1 (perfect precision and recall) and worst at 0. High scores show that the classifier is returning accurate results (high precision), as well as returning a majority of all positive results (high recall). A system with high recall but low precision returns many results, but most of its predicted labels are incorrect when compared to the training labels. A system with high precision but low recall is just the opposite, returning very few results, but most of its predicted labels are correct when compared to the training labels. An ideal system with high precision and high recall will return many results, with all results labeled correctly.

An Enhanced GWO Algorithm with Improved Explorative Search Capability for Global Optimization and Data Clustering

View Article

Journal Information

Published in Applied Artificial Intelligence, 2023

Gyanaranjan Shial, Sabita Sahoo, Sibarama Panigrahi

With the context of information retrieval recall can be termed as relevant retrieved information which is the ratio between sums of relevant retrieved and total number of relevant items in the corpus whereas precision refers to the ratio between relevant retrieved documents and the total number of retrieved documents. Harmonic mean of both precision and recall is termed as F-score. Nowadays F1 − score is used in machine learning for both binary and multiclass scenarios. F-score has major drawback over MCC by giving incorrect score in the case of class swapping (if the positive class is renamed negative or vice-versa). However, F-score is equally invariant compared to MCC if micro/macro average F1 is used for class swapping problem. Second problem is F − score that it is independent from negative class being classified as positive. Despite of several flaws, F − score still remains the most widely spread performance matric among researchers. According to Cao et al., F-score and MCC estimate more realistic performance metrics for classification models (Cao, Chicco, and Hoffman 2020). Mathematically, the F-score performance measure is calculated as given in eq. (32).

The Deep Learning ResNet101 and Ensemble XGBoost Algorithm with Hyperparameters Optimization Accurately Predict the Lung Cancer

View Article

Journal Information

Published in Applied Artificial Intelligence, 2023

Saghir Ahmed, Basit Raza, Lal Hussain, Amjad Aldweesh, Abdulfattah Omar, Mohammad Shahbaz Khan, Elsayed Tag Eldin, Muhammad Amin Nadim

The performance was evaluated using standard performance evaluation measures and training and testing data formulation were employed using split method and 10-fold cross validation (CV) (Divya Rathore and Agarwal 2014; Hussain et al. 2019; Rathore et al. 2013, 2014; Rathore, Hussain, and Khan 2015). ML and deep learning techniques are evaluated using standard performance metrics such as accuracy, precision, recall, and F1-score to measure their effectiveness and efficiency in solving a given task. F1-measure is the harmonic mean of precision and recall metrics. Precision has been widely used as measure to evaluate the performance of information retrieval techniques and it refers to the fraction of retrieved documents that are relevant. Following standard performance evaluation metrics are utilized (Jalil et al. 2022):

HQEBSKG: Hybrid Query Expansion Based on Semantic Knowledgebase and Grouping

View Article

Journal Information

Published in IETE Journal of Research, 2022

Mohammad Reza Keyvanpour, Zahra Karimi Zandian, Zahra Abdolhosseini

Ontology is a type of dictionaries or knowledge representation sources such as WordNet, which consists of concepts and their semantical relations. Indeed, an ontology is an explicit specification of a conceptualization. Ontologies are usable in both computer science and information science [29]. What is important here is the ontology application in semantic information retrieval, especially query expansion. When an ontology is used as a knowledge resource to improve relevant information retrieval in a particular domain to form conceptual information based on the initial query, such a phenomenon is described as query expansion using domain-specific ontology [30]. For the production and use of ontologies in each domain, it requires initial knowledge and time [31]. In this paper, WordNet and FarsNet as two ontologies are used.