Example-based machine translation – Knowledge and References

Explore chapters and articles related to this topic

Wearable Computers

Published in Julie A. Jacko, The Human–Computer Interaction Handbook, 2012

Daniel Siewiorek, Asim Smailagic, Thad Starner

Figure 12.19 depicts the structure of the speech translator from English to a foreign language and vice versa. The speech is input into the system through the SR subsystem. A user wears a microphone as an input device, and background noise is eliminated using filtering procedures. A language model, generated from a variety of audio recordings and data, provides guidance for the SR system by acting as a knowledge source about the language properties. The LT engine uses an example-based machine translation (EBMT) system, which takes individual sentence phrases and compares them to a corpus of examples it has in memory to find phases it knows how to translate. A lexical machine translation (glossary) translates any unknown word that may be left. The EBMT engine translates individual “chunks” of the sentence using the source language model and then combines them with a model of the target language to ensure correct syntax. When reading from the EBMT corpus, the system makes several random-access reads while searching for the appropriate phrase. Since random reads are done multiple times, instead of loading large, continuous chunks of the corpus into memory, the disk latency times will be far more important than the disk bandwidth. The speech generation subsystem performs text to speech conversion at the output stage. To make sure that misrecognized words are corrected, a Clarification Dialog takes place on-screen. It includes the option to speak the word again or to write it in. As indicated in Figure 12.19, an alternative input modality could be the text from the Optical Character Recognition subsystem (such as scanned documents in a foreign language), which is fed into the LT subsystem.

An Enhanced RBMT: When RBMT Outperforms Modern Data-Driven Translators

View Article

Journal Information

Published in IETE Technical Review, 2022

Md. Adnanul Islam, Md. Saidul Hoque Anik, A. B. M. Alim Al Islam

Being the seventh language in terms of total speakers worldwide, Bengali lags in some crucial sectors of natural language processing (NLP) such as machine translation, text categorization, sentiment analysis, syntax and semantic checking, etc. [6]. Most remarkable existing work on machine translation includes phrase-based machine translation (PBMT) [7], example-based machine translation (EBMT), syntactic transfer, and the use of syntactic chunks as translation units [8]. Besides, although significant research studies can be found on English to Bengali translation [9,10], the very limited focus has been given on translating on the other way, i.e. from Bengali to English [8,11]. Additionally, most noteworthy translators (e.g. Google Translate, Bing, etc.) often exhibit poor performance in translating Bengali to other languages. Google Translate, the most popular one uses data-driven translation approaches (currently, NMT with RNN architecture) for multilingual translation [12,13]. Beforehand, SMT [14], another popular data-driven approach, was used by Google Translate. However, NMT and SMT face a significant problem while translating between many language pairs (e.g. Bengali-English) due to their unavailability of effective large parallel corpora [15,16].

Machine translation model for effective translation of Hindi poetries into English

View Article

Journal Information

Published in Journal of Experimental & Theoretical Artificial Intelligence, 2022

Rajesh Kumar Chakrawarti, Jayshri Bansal, Pratosh Bansal

The proposed HBMT system’s result is compared with Google, Microsoft (MS)-BING, Babylonian and HMT translators on a set of 500 manually translated Hindi-English sentences which are made up of 150 complex sentences, 200 simple sentences, 75 idiom-based sentences and 75 sentences with ambiguity sentence. Google translator is basically an SMT type of translator which has to be learned on a big corpus for better efficiency and robustness. Likewise, MS-Bing is fundamentally based on both SMT and RBMT approaches for the translation. Similarly, Babylonian uses SMT and morphological operations to perform the translation. Also, HMT is basically based on SMT, Example-Based Machine Translation (EBMT), and RBMT (14). All in all, statistical learning plays a major role for a most accurate machine translator.