Document classification – Knowledge and References

Explore chapters and articles related to this topic

Deployment of Supervised Machine Learning and Deep Learning Algorithms in Biomedical Text Classification

Published in Saravanan Krishnan, Ramesh Kesavan, B. Surendiran, G. S. Mahalakshmi, Handbook of Artificial Intelligence in Biomedical Engineering, 2021

G. Kumaravelan, Bichitrananda Behera

Depending on the usage of the ML algorithm, automatic document classification task is often classified into three broad classes specifically supervised document classification, unsupervised document classification, and semisupervised document classification. In supervised document classification, some external mechanism is needed manually to the classifier model, which contributes information related to the precise document classification. In unsupervised document classification, there is no scope of having an external mechanism to provide information to the classification model to the correct document classification. In semisupervised document classification, a partial amount of the documents are labeled by an external mechanism. This chapter focuses on the deployment of state-of-the-art supervised ML algorithms for biomedical text classification.

Machine Learning

View Chapter

Purchase Book

Published in Sudhir Kumar Sharma, Bharat Bhushan, Narayan C. Debnath, IoT Security Paradigms and Applications, 2020

Vidushi, Manisha Agarwal

It is one of the powerful probabilistic classification techniques of machine learning. It works on the basis of Bayes theorem. It assumes that predictors do not depend on each other [58]. It means that the features present in a dataset are independent of each other, and that is why, it is called naive. Bayes word comes from the Bayes theorem. The main applications of this algorithm are document classification, sentiment analysis, and spam filtration. Bayes theorem is as follows [58]: P(c|x)=P(x|c)P(c)P(x)

Introduction

View Chapter

Purchase Book

Published in Sugato Basu, Ian Davidson, Kiri L. Wagstaff, Constrained Clustering, 2008

Sugato Basu, Ian Davidson, Kiri L. Wagstaff

Abstract In this paper, we discuss the merits of using supervised clustering for coherent categorization modeling. Traditional approaches for document classification on a predefined set of classes are often unable to provide sufficient accuracy because of the difficulty of fitting a manually categorized collection of records in a given classification model. This is especially the case for domains such as text in which heterogeneous collections of Web documents have varying styles, vocabulary, and authorship. Hence, this paper investigates the use of clustering in order to create the set of categories and its use for classification. We will examine this problem from the perspective of text data. Completely unsupervised clustering has the disadvantage that it has difficulty in isolating sufficiently fine-grained classes of documents relating to a coherent subject matter. In this chapter, we use the information from a pre-existing taxonomy in order to supervise the creation of a set of related clusters, though with some freedom in defining and creating the classes. We show that the advantage of using supervised clustering is that it is possible to have some control over the range of subjects that one would like the categorization system to address, but with a precise mathematical definition of how each category is defined. An extremely effective way then to categorize documents is to use this a priori knowledge of the definition of each category. We also discuss a new technique to help the classifier distinguish better among closely related clusters.

Exploring deep learning approaches for Urdu text classification in product manufacturing

View Article

Journal Information

Published in Enterprise Information Systems, 2022

Muhammad Pervez Akhter, Zheng Jiangbin, Irfan Raza Naqvi, Mohammed Abdelmajeed, Muhammad Fayyaz

The tremendous growth of Urdu text documents on the internet is creating challenges for researchers to find an automatic, reliable and fast way to organise these documents. Text document classification is a task of automatically assigning a label from a set of pre-defined labels to a document based on its contents. Text document classification has several applications in text mining and information retrieval like spam detection (Akhtar, Tahir, and Shakeel 2017; Jain, Sharma, and Agarwal 2018), tweet analysis (Ali et al. 2018), sentiment analysis (Mehmood, Essam, and Shafi 2019), document organisations (Tripathy, Anand, and Rath 2017; Rao et al. 2018). Urdu is a national language of Pakistan and has more than 300 million speakers all over the world (Riaz 2012) but it is a resource-poor language. The rich and complex morphological script, no capitalisation of characters, has diacritics, free word order, context-sensitive are some main characteristics of Urdu that make it more challenging for automatic text processing.

Learning word hierarchical representations with neural networks for document modeling

View Article

Journal Information

Published in Journal of Experimental & Theoretical Artificial Intelligence, 2020

Longhui Wang, Yong Wang, Yudong Xie

The accuracy, macro-average precision, recall, and F-measure were used as performance evaluation indicators for the document classification task. Let TP (true positive) denote the number of positive samples that were predicted to be positive samples, FN (false negative) denote the number of positive samples that were predicted to be negative samples, FP (false positive) denote the number of negative samples that were predicted to be positive samples, and TN (true negative) denote the number of negative samples that were predicted to be negative samples. Then,

A Novel Semantic-Enhanced Text Graph Representation Learning Approach through Transformer Paradigm

View Article

Journal Information

Published in Cybernetics and Systems, 2023

Tham Vo

To evaluate the accuracy performance of text classification task with different techniques, we mainly use the standard F1 evaluation metric. Specifically, the F1 evaluation metric takes the consideration on the precision () and recall () values of classification results of each technique to calculate the F1 score. In our case, the F1 evaluation metric is used for analyzing the accuracy performance of document classification task which is formally defined as the following (as shown in Equations (7) and (8)):