Explore chapters and articles related to this topic
Analysis of a Machine Learning Algorithm to Predict Wine Quality
Published in Roshani Raut, Salah-ddine Krit, Prasenjit Chatterjee, Machine Vision for Industry 4.0, 2022
The performance of the classification models for a given set of test data is drawn by using confusion matrix. It can only be determined if the true values for test data are known. In information retrieval and classification in machine learning, precision is also called positive predictive value that is the fraction of relevant instances among the retrieved instances, while recall is also known as sensitivity that is the fraction of relevant instances that were retrieved. Both precision and recall are therefore based on relevance. In statistical hypothesis testing, a type-I error is the rejection of a true null hypothesis also known as a “false-positive” (FP) finding or conclusion; for example, an innocent person is convicted, while a type-II error is the non-rejection of a false null hypothesis also known as a “false-negative” (FN) finding or conclusion; for example, a guilty person is not convicted. The different terms used are described next:
Internet of Things
Published in Neeraj Kumar, Aaisha Makkar, Machine Learning in Cognitive IoT, 2020
In recent years, there has been an exponential increase in the usage of the Internet. It is widely used for information retrieval. This information is gathered, stored, and processed at the central repository known as a web server. The server represents this information in the form of web documents, which are accessible with the help of the Internet. According to the Statista report, a web information provider company, the number of Internet users in 2018 was 369.01 million. The platform used for accessing the Internet is mostly the dedicated designed software known as a web search engine. According to NETMARKETSHARE, a market share statistics provider, the largest market share of search engines is achieved by Google, i.e., 72.03%, followed by Baidu(14.11%), Bing(7.76%) and Yahoo(4.27%), accessed on September 2018. The reason behind the success of web search engines are the search engine result pages (SERPs). These pages are ranked by the ranking methodology which considers the important features of a web page. PageRank is the ranking algorithm used by Google for ranking the web pages for SERPs.
Netflow Feature Evaluation for the Detection of Slow Read HTTP Attacks
Published in Stuart H Rubin, Lydia Bouzar-Benlabiod, Reuse in Intelligent Systems, 2020
Cliff Kemp, Chad Calvert, Taghi M Khoshgoftaar
Precision-Recall is a useful measure of success of prediction when the classes are very imbalanced. In information retrieval, precision is a measure of result relevancy, while recall is a measure of how many truly relevant results are returned. The F-measure (F-score), which is a measure of a test’s accuracy, is defined as the weighted harmonic mean of the precision and recall of the test and conveys the balance between the precision and the recall. An F-score reaches its best value at 1 (perfect precision and recall) and worst at 0. High scores show that the classifier is returning accurate results (high precision), as well as returning a majority of all positive results (high recall). A system with high recall but low precision returns many results, but most of its predicted labels are incorrect when compared to the training labels. A system with high precision but low recall is just the opposite, returning very few results, but most of its predicted labels are correct when compared to the training labels. An ideal system with high precision and high recall will return many results, with all results labeled correctly.
Story Analysis Using Natural Language Processing and Interactive Dashboards
Published in Journal of Computer Information Systems, 2022
A key element of natural language understanding, as applied to the story analysis task described in this paper, is information extraction (IE), which can be defined as “automatic extraction of structured information such as entities, relationships between entities, and attributes describing entities from unstructured sources such as text corpus or text documents”.3 IE involves several tasks relevant to story analysis, including named entity recognition (NER), relation extraction, event extraction, temporal expression, and template filling.4 IE is related to information retrieval (IR), but differs in an important respect.5 Information retrieval is used to identify relevant documents, a task often associated with search engines like Google. By contrast, information extraction uses NLP techniques to take the retrieved unstructured text data from documents and impose structure and “meaning” onto it. Although IR is an important means of filtering out irrelevant text from a myriad of documents, it is outside the scope of this paper. IE is more pertinent for the task of helping users to quickly understand the underlying meaning of the text once retrieved.
Company Ranking Prediction Based on Network Big Data
Published in IETE Journal of Research, 2021
As the user queries for information, the task of the information retrieval [3] system is to return a list of documents that are sorted in order of predicted relevance to the query (from highest to lowest). In recent years, researchers have used supervised machine learning techniques to solve this problem. In this technology, the training examples are query-document pairs, and the corresponding labels are relevance grades – this is called Learning-to-Rank (LtR) [4]. As a cross-field of machine learning and information retrieval, LtR uses training data in combination with machine learning algorithm to automatically build a ranking model, which can sort new input objects according to their relevance, preference or importance. Literature [5] has studied a variety of supervised learning methods based on LtR, and verified its superiority over traditional methods through experiments. The main purpose of LtR research is to obtain info needed by users accurately, timely and efficiently from massive data on the Internet.
A factorisation-based recommendation model for customised products configuration design
Published in International Journal of Production Research, 2023
Huifang Zhou, Shuyou Zhang, Lemiao Qiu, Zili Wang, Kerui Hu
In this paper, we develop a recommendation model for customised product configuration design. Our recommendation model takes personalised customer requirements and product component information as input and outputs a ranked list of component instances. It consists of two sub-models: a retrieval sub-model and a ranking sub-model. The retrieval sub-model selects an initial set of component instance candidates from all possible candidates, and then the ranking sub-model ranks these selected candidates and picks out the best possible candidate. To boost the performance of the ranking sub-model, we propose a novel interacting network, DualAdap, to extract meaningful low-order and high-order cross features through the multi-head self-attention mechanism and meanwhile learn the adaptive-order cross features through the logarithmic transformation layer. Based on the learned cross features, the ranking sub-model computes the adoption scores of all candidates given by the retrieval sub-model and then ranks them according to their adoption scores to select the best possible candidate. The configuration design of the elevator traction machine is taken as a case study. Results show that the top-10 categorical accuracy of the retrieval sub-model is 100%, and our proposed ranking sub-model achieves the best performance overall baseline methods and is sufficiently efficient by evaluating AUC, Logloss, and training time. Compared with actual configuration design results, the probability of whether our recommendation model can give accurate elevator traction machine instances is 98.56% among 208 test samples. The effectiveness and efficiency of our recommendation model are verified.