Explore chapters and articles related to this topic
Natural Language Processing Viewed from Semantics
Published in Masao Yokota, Natural Language Understanding and Cognitive Robotics, 2019
From the 1950s, when the earliest machine translation systems emerged, to the 1980s, there were two types of approaches to cross-language operation, namely, interlingual and transfer-based approaches. In the former, the source language is translated into a certain interlingua, and the target language is generated from the interlingua independently of the source language. In the latter, each pair of source language and target language requires a module, called transfer component, to exchange their corresponding intermediate representations, for example, language-specific dependency structures as grammatical descriptions. Both the techniques are referred to as rule-based machine translation but the transfer approach has been prevailing because, from the technical viewpoint, it is very difficult to develop an interlingua except in limited task domains where expressions are well formulated or controlled. The most serious problem of rule-based machine translation is that it needs extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules, almost all of which are inevitably hand-written.
Introduction
Published in Krzysztof Wołk, Machine Learning in Translation Corpora Processing, 2019
In most cases, there are two steps required for SMT: An initial investment that significantly increases the quality at a limited cost, and an ongoing investment to incrementally increase quality. While rule-based MT brings companies to the quality threshold and beyond, the quality improvement process may be long and expensive [137]. Rule-based machine translation relies on numerous built-in linguistic rules for each language pair. Software parses text and creates a transitional representation from which the text in the target language is generated. This process requires extensive lexicons with morphological, syntactic and semantic information and large sets of rules. The software uses these complex rule sets and then transfers the grammatical structure of the source language into the target language. Translations are built on gigantic dictionaries and sophisticated linguistic rules. Users can improve out-of-the-box translation quality by adding their own terminology into the translation process. They create user-defined dictionaries, which override the system’s default settings [137]. The rule-based translation (RBMT) process is more complex than just substitution of one word for another. For such systems, it is necessary to prepare linguistic rules that would allow such words to be put in different places and to change their meaning depending on context, etc. The RBMT methodology must apply a set of linguistic rules in three phases: Syntax and semantic analysis, transfer, syntax and semantic generation [168].
Morphological segmentation method for Turkic language neural machine translation
Published in Cogent Engineering, 2020
U. Tukeyev, A. Karibayeva, Z h. Zhumanov
Sánchez-Cartagena et al. demonstrated the morphological segmentation using Apertium and integrated the output of rule-based machine translation (Sánchez-Cartagena et al., 2019). They segmented a source text using a rule-based morphological analyser. If a word had no valid segmentation, many segmentation variants were generated as there were known suffixes that matched the word. After morphological segmentation, the BPE was applied to all the training data. For example, “университетiнiң”(of her/his university) has the morphological analysis result n.px3sp.gen. The proposed morphological segmentation split this term as “университет@@ iнiң”, whereas BPE left the word unchanged as “университетiнiң(of her/his university)”. Thus, the proposed morphological segmentation divides the given words into only two parts. In contrast, our morphological segmentation based on the complete set of Kazakh endings performs splitting into more than two parts and conducts segmentation by using ending types defined exactly according to the grammar: “университет@@ i@@нiң”.
Machine translation model for effective translation of Hindi poetries into English
Published in Journal of Experimental & Theoretical Artificial Intelligence, 2022
Rajesh Kumar Chakrawarti, Jayshri Bansal, Pratosh Bansal
The proposed half-breed Machine Translation (HBMT) method consists of a tree-based, statically based, and rule-based machine translation approach to translating the poem from Hindi to English. The tree-based calculation done word by word by going through various lexicon words dependent on the tree. Each tree-based lexicon speaks to a fractional or complete word.