Word-sense disambiguation – Knowledge and References

Explore chapters and articles related to this topic

Functional Architecture for Knowledge Semantics

Published in Denise Bedford, Knowledge Architectures, 2020

Semantic distance is one way of measuring the closeness in the meaning of two concepts or things in a specific context. Specifically, semantic distance is a statistical measure of how close or distant two units of language are in terms of their meaning. This measure was developed in the natural language processing context to help resolve some common semantic challenges such as machine translation, word sense disambiguation, speech recognition, spelling corrections, and so on. In a given context, a fundamental assumption is that words that are found in proximity have some meaningful relationship. It does not indicate that they have ‘similar meanings,’ but that the relationship between these things is meaningful in this particular context. Semantic distance is vital to knowledge seeking, discovery, and search because we want to be able to detect and understand the meaningful relationships among concepts and across concepts. In the future, such semantic networks or semantic explanations may become a kind of ‘relational knowledge’ in and of themselves. They are sources of knowledge that help us to understand the meaning or to make sense of a domain.

Artificial Intelligence for Document Image Analysis

View Chapter

Purchase Book

Published in Sk Md Obaidullah, KC Santosh, Teresa Gonçalves, Nibaran Das, Kaushik Roy, Document Processing Using Machine Learning, 2019

Himadri Mukherjee, Payel Rakshit, Ankita Dhar, Sk Md Obaidullah, KC Santosh, Santanu Phadikar, Kaushik Roy

Word sense disambiguation [51,52] is the process of determining the actual meaning of a word in a sentence when a word has multiple meanings. For instance, the word “bank” may mean “river bank” or “a place where we deposit money”. It is very important to understand the meaning of different words prior to understanding the meaning of a text for different applications like summarization and evaluation. For instance, in the sentence, “I will go to the bank for a loan”, “bank” refers to the financial place and not to a river bank. The meaning of a word is very context-dependent, and analysis of neighboring text can help derive its meaning.

Healthcare NLP Infrastructure for the Greek Language

View Chapter

Purchase Book

Published in Satya Ranjan Dash, Shantipriya Parida, Esaú Villatoro Tello, Biswaranjan Acharya, Ondřej Bojar, Natural Language Processing in Healthcare, 2022

Aristides Vagelatos, Elena Mantzari, Mavina Pantazara, Christos Tsalidis, Chryssoula Kalamara

Ambiguity is an inherent property of natural languages, as it expresses the level of uncertainty about the meaning of a word, phrase, or sentence, and is present in all phases of content analysis. Word ambiguity can be lexical, syntactic, and semantic. Lexical ambiguity is due to polysemy (i.e., the fact that there is more than one meaning for a word in a natural language). The process to resolve the correct meaning is called “word sense disambiguation” and is presented in advanced NLP pipelines. A subcase is considered the Part of Speech (POS) disambiguation, where the process is referred to as tagging, and the program that accomplishes it is called tagger. Usually, a Greek tagger uses a mixed scheme with three phases (Orphanos and Christodoulakis 1999). In the first phase, the word is looked up in the morphological dictionary and, if it is found and it has a single entry, all morphological attributes are returned. In case no single entry is found, a set of rules work on the context of the word, trying to distinguish the correct one. If a word is unknown, a set of rules examine the suffix and other characteristics of the word, trying to guess its POS. To support this processing, a morphological dictionary has been developed (Tsalidis et al. 2004) with ~100,000 lemmas and 1,200,000 words, containing rich morphological and semantic information for lemmas and inflectional or derivational types. The ambiguity of the words in this dictionary has been categorized as follows: (a) words present in different lemmas ~45,000, (b) words with different POS ~32,000, (c) words that differ in at least one morphological attribute, e.g., number, case, gender ~180,000, (d) words that differ in hyphenation ~650, and (e) words with different morphemic structure ~5,000. POS disambiguation and guessing are carried out with the help of decision trees through examination of the local context, achieving an accuracy of ninety-seven percent (97%) in POS disambiguation and eighty-nine percent (89%) in POS guessing (Orphanos, and Christodoulakis 1999).

Enhanced unsupervised neural machine translation by cross lingual sense embedding and filtered back-translation for morphological and endangered Indic languages

View Article

Journal Information

Published in Journal of Experimental & Theoretical Artificial Intelligence, 2022

Shweta Chauhan, Shefali Saxena, Philemon Daniel

In unsupervised learning, in which we can train an MT system using solely monolingual corpora. The task of identifying the correct sense of a word in context, known as Word Sense Disambiguation (WSD), is a central problem for all-natural language processing applications, particularly machine translation; different senses of a word are translated differently in other languages, and resolving sense ambiguity is required to identify the correct translation of a word (Pelevina et al., 2017). Building specialised WSD models has taken a lot of time and effort. The WSD is responsible for determining the optimum word sense to give to each word. This technology enables applications like IR, information extraction, machine translation, question answering systems, cross-lingual applications, and document categorisation (Aliwy & Taher, 2019). There are four major strategies used in this domain: (a) supervised, (b) knowledge based, (c)semi-supervised, and (d) unsupervised. (Azarbonyad et al., 2019; Raheja et al., 2022) classified various approaches for solving word sense disambiguation, such as supervised approaches that require sense tagged corpora, semi-supervised approaches that use a limited amount of sense tagged corpora while also using a large amount of unlabelled corpus, Knowledge-based approaches use massive lexical resources such as machine-readable dictionaries, ontologies, thesaurus, and other resources to determine the sense of the target word, while unsupervised approaches do not require any tagged corpora. The author (Chauhan et al., 2022; Wang et al., 2020) rely on hand-annotated sense-labelled corpora or custom lexical resources. Instead, they use raw corpora to automatically generate a sense inventory. (Han & Shirai, 2021) presents a quick survey in the field of WSD for Asian Languages.