Language model – Knowledge and References

Explore chapters and articles related to this topic

Evolution of Long Short-Term Memory (LSTM) in Air Pollution Forecasting

Published in Monika Mangla, Subhash K. Shinde, Vaishali Mehta, Nonita Sharma, Sachi Nandan Mohanty, Handbook of Research on Machine Learning, 2022

Satheesh Abimannan, Deepak Kochhar, Yue-Shan Chang, K. Thirunavukkarasu

Text generation is another exciting application of generative LSTMs. It can learn the sequences of a text corpus and then generate entirely new data sequences. The language model is the core element of natural text generation. A statistical language model is a probability distribution over sequences of words. It assigns a probability for the likelihood of a given word/sequence of words to follow a sequence of words. Given the input text sequence, with LSTMs we can build a generative model that generates natural language text. After performing tokenization on the input text sequence, we generate a sequence of tokens with uniform lengths (padding applied) to be fed into the LSTM model. With this, the model learns to relate adjacent words/phrases and develops a probability score of next word occurrence based on the input received till that point. Recently, many researchers and developers across the globe have implemented this application using a variety of text corpora including Shakespeare’s literature, famous novels, songs, scripts, etc., and found that the generative LSTMs can learn the associated language model very effectively and reproduce plausible results in a fashion strikingly similar to with which it was trained.

A Hybrid Approach for Video Indexing Using Computer Vision and Speech Recognition

View Chapter

Purchase Book

Published in Vijay Kumar, Mangey Ram, Predictive Analytics, 2021

Saksham Jain, Akshit Pradhan, Vijay Kumar

Components of a speech recognition system are as follows:Voice Input: With the help of a microphone, audio is input into the system or an audio clip can be used.Digitalization: The process of converting the analog signal into digital form is known as digitalization.Acoustic Model: The acoustic model is created by taking audio recordings of speech and their text transcriptions and using software to create statistical representations of the sounds that make each word.Language Model: It is used in natural language processing, which has applications, such as speech recognition. Speech recognition tries to capture the properties of a language and to predict the next word in the speech sequence.Speech Engine: It converts the input audio into text.

Quantifying information

View Chapter

Purchase Book

Published in Jun Wu, Rachel Wu, Yuxi Candice Wang, The Beauty of Mathematics in Computer Science, 2018

Jun Wu

Information entropy is exactly this measurement of uncertainty. Since language is contextual, we use conditional entropy for high-level language models. If we take into account the deviations between training corpus and real-world text, we add in relative entropy. Based on condition entropy and relative entropy, Jelinek derived a concept called perplexity, quantifying the quality of a language model. A language model's perplexity has a definite physical meaning: given some existing context (e.g., parts of a sentence), a model with higher perplexity will allow more words to fill in the gaps, and vice versa. Thus, the lower a model's perplexity, the more certain we are of each word, and the better the model.

How generative AI models such as ChatGPT can be (mis)used in SPC practice, education, and research? An exploratory study

View Article

Journal Information

Published in Quality Engineering, 2023

Fadel M. Megahed, Ying-Ju Chen, Joshua A. Ferris, Sven Knoth, L. Allison Jones-Farmer

The notion of a language model has risen in popularity in the last decade thanks to modeling techniques and computational power advances. Language models are used to assign probabilities to word sequences. Interestingly, the foundation of language models was set by the mathematician Andrey A. Markov in the early 1900s, who used Alexander Pushkin’s Eugene Onegin novel to demonstrate that letter pair sequences are not independent and that the likelihood of a letter’s appearance can be approximated (Markov 2006). Shannon (1948) built on that work, developing a statistical model for the English language and showing that text sequences could be generated. The work of Markov and Shannon is the foundation of modern language work; modern models will use sequences of words rather than events (Markov) or characters/words (Shannon). Today, language models are ubiquitous. Examples include text auto-completion (used in web browsers, mobile phones’ messaging apps, software GUIs, etc.), machine translation, natural language generation, and optical character recognition.

An active recursive state estimation framework for brain-interfaced typing systems

View Article

Journal Information

Published in Brain-Computer Interfaces, 2019

Aziz Koçanaoğulları, Yeganeh M. Marghi, Murat Akçakaya, Deniz Erdoğmuş

Query optimization for BCI typing systems is not a well-studied problem. To the best of our knowledge, there is a limited number of studies that addressed the query optimization problem for the BCI typing system designs. Omar [5] proposed a posterior matching scheme for a typing task. Higger [3] used maximum mutual information (MMI) coding for query selection to maximize the information transfer rate (ITR) in the typing task. In the last study done by Moghadamfalahi [4], the authors used expected posterior maximization (EPM) for query selection for a BCI typing system. However, all of these query selection methods result in the selection of the N-best stimuli based on the posterior distribution [6,7]. In this scenario, with respect to the current posterior distribution, letters are selected with descending order of associated probability mass function. Choosing the N-best queries based on the current belief (prior information), however, does not always provide the best performance in RSE problems. Because, current belief may not be always trusted and may include misleading information. In the case of the BCI typing systems, for instance, the current belief may be negatively influenced by the prior information provided by a language model. The language model provides probability values over the alphabet that is statistically learned from a dataset. As a matter of fact, word choices are topic dependent and it is not possible for the statistical model to capture each possibility. This yields some word choices to be statistically uncommon for the language model. If the user intent (target state) is an uncommon phrase (e.g. an English word starting with letter X), the prior behaves in an adversarial manner, causing a longer estimation session, or may lead to a wrong state estimation due to limitations of EEG evidence such as noise and limited number of typing sequences. Therefore, such BCI systems also require exploration beyond the current belief (posterior probability over letters).