Pointwise mutual information – Knowledge and References

Explore chapters and articles related to this topic

Word Embeddings

Published in Jan Žižka, František Dařena, Arnošt Svoboda, Text Mining with Machine Learning, 2019

Jan Žižka, František Dařena, Arnošt Svoboda

A disadvantage of using raw frequencies in a term-term matrix is that some words with high frequencies (often stop words) tend to co-occur with many other words but are not very discriminative. Instead, it is better to somehow normalize the frequencies so that real relations between words become obvious. One of the more commonly used measures in this context is pointwise mutual information (PMI); see Section 14.3.2. It measures whether two words occur more often together than if they were independent. The values of PMI range from −∞ to +∞. Negative values are replaced by zeros because it can be problematical to interpret what it means that words co-occur less than by chance. The measure is biased towards low frequency words, so raising the probabilities to a number (e.g., 0.75) or adding one (Laplace smoothing) to all frequencies is often used [138].

Feature Engineering for Text Data

View Chapter

Purchase Book

Published in Guozhu Dong, Huan Liu, Feature Engineering for Machine Learning and Data Analytics, 2018

Chase Geigle, Qiaozhu Mei, ChengXiang Zhai

the log $ {\text{log }} $ -ratio of the joint probability of seeing word w in the context of word c to the probability of seeing w and c individually assuming they were independent. This value will be large when the two words are highly correlated. One issue with this weighting is that there will be many entries where PMI(w,c)=log0=-∞ $ PMI(w, c) = {\text{log }}0 = - \infty $ . Two common workarounds exist: either set PMI(w,c)=0 $ PMI(w, c) = 0 $ for all unobserved (w,c) $ (w, c) $ pairs, or instead drop all entries in the matrix where PMI(w,c)<0 $ PMI(w, c) < 0 $ , keeping only the positive pointwise mutual information (PPMI) values [3]. The result is a large sparse matrix where each word is a vector whose entries correspond to the PPMI measure between that word in the context of each of the other words. This can be useful directly, but just as in LSA it is common to perform SVD on this matrix to instead produce a much lower-dimensional, dense representation for each word. Under this decomposition, one can obtain two different vectors for each word: one that describes the word directly when it is the “target” word in a window (obtained from the left singular vectors), and another that describes the word when it is observed as a “context” word in a window (obtained from the right singular vectors).

Anti-Depression Psychotherapist Chatbot for Exam and Study-Related Stress

View Chapter

Purchase Book

Published in Nilanjan Dey, Sanjeev Wagh, Parikshit N. Mahalle, Mohd. Shafi Pathan, Applied Machine Learning for Smart Data Analysis, 2019

Mohd. Shafi Pathan, Rushikesh Jain, Rohan Aswani, Kshitij Kulkarni, Sanchit Gupta

To remove the restrictions looked by manage-based strategies; specialists conceived some measurable machine learning procedures, which can be subdivided into regulated and unsupervised systems. Supervised machine learning with influence dictionaries: One of the soonest regulated machine learning strategies was utilized by Alm, where they utilized a progressive consecutive model alongside SentiWordNet list for extensively filtered feeling grouping. Sentences from blogs are organized utilizing Support Vector Machines (SVM). Albeit, directed learning performs well; it has the unmistakable burden that vast clarified informational indexes are required for preparing the classifiers and classifiers prepared on one area for the most part don’t perform so well on another.Supervised machine learning without influence vocabularies: A correlation among three machine learning calculations on a film survey informational collection presumed that SVM plays out the best. A similar issue was additionally endeavored utilizing the delta tf-idf work.Unsupervised machine learning with influence vocabularies: An assessment of two unsupervised strategies utilizing WordNet-Affect used a vector space display and various dimensionality lessening techniques. News features have been ordered utilizing straightforward heuristics and more refined calculations (e.g., comparability in an idle semantic space).Unsupervised machine learning without influence dictionaries: Some motivating work done here incorporates “LSA single word” which measures similitude amongst content and every feeling and the “LSA feeling synset” approach which utilizes WordNetsynsets. Our approach shares a comparative instinct as that of the “LSA feeling synset” strategy, but with some remarkable contrasts as we utilize Pointwise Mutual Information (PMI) to register the semantic relatedness, which is additionally improved by setting reliance rules. In spite of the fact that utilization PMI to assemble insights from three web search tools, they contrast a whole expression with only one feeling word because of long web-based preparing times, while, in our approach, each significant word with an arrangement of agent words for every emotion is taken into account along with its context.

Factors influencing crowdsourcing riders’ satisfaction based on online comments on real-time logistics platform

View Article

Journal Information

Published in Transportation Letters, 2023

Yi Zhang, Xiaomin Shi, Zalia Abdul-Hamid, Dan Li, Xinle Zhang, Zhiyuan Shen

Natural language processing (NLP) tasks for online comments include information extraction, topic modeling, and sentiment analysis (Chang, Ku, and Chen 2020). The lexicon based approach and machine learning approach are the common ways of sentiment analysis (Hakak et al. 2017). The lexicon based approach use some keywords and a series of rules in documents or sentences to perform sentiment analysis (Rout et al. 2018), for example, using the pointwise mutual information (PMI) between words for the sentiment analysis (Turney 2002). However, as this method often relies heavily on the construction of sentiment dictionary and the design of corresponding rules, it cannot deal with all sentiment-related content. For the sentiment analysis method based on machine learning, the major approaches for sentiment analysis include Naive Bayes, Support Vector Machine, K-Nearest Neighbor etc.(Li and Wu 2010). The machine learning approach mainly uses the language features of the text to learn from the data, and the extracted feature vectors are input into the machine learning model for prediction (Hakak et al. 2017). But being affected by the range and type of comment text, it cannot be flexibly extended in real applications. In order to improve the results of sentiment analysis, some scholars have proposed to combine unsupervised sentiment dictionary with supervised machine learning. This is a semi-supervised method, which is currently proven to be effective (Lee, Kim, and Song 2021). For example, some scholars use a combination of nonlinear feature dictionaries and machine learning classification algorithms to achieve a recognition rate of up to 90% (Zangeneh Soroush et al. 2018). In addition, using vocabulary-based and support vector machine-based classifiers to perform mixed sentiment analysis will also achieve an accuracy of more than 80% (Aldayel and Azmi 2016).