GPT – Knowledge and References

Explore chapters and articles related to this topic

Text Mining Structure the Unstructured

Published in Chong Ho Alex Yu, Data Mining and Exploration, 2022

The lesson is: If a computer program relies solely on grammatical rules and dictionaries to conduct text analysis, the results are more likely to be laughable. As a remedy, text mining employs sophisticated NLP in an attempt to “understand” the data in the same way that a human coder would. This Herculean task necessitates huge investments. For example, in June 2020 OpenAl announced the GPT-3 natural language model using 175 billion parameters after spending $12 million in R & D (Wiggers 2020). Later Google set a new record by building a new natural language processing model, Switch Transformer, with 1.6 trillion parameters (Peckham 2021). In 2021 China broke this record by introducing WuDao 2.0, which carries 1.75 trillion parameters (Spadafora 2021). NLP is improving at lightning speed, and therefore the issues described above are no longer insurmountable.

Pre-trained and Application-Specific Transformers

View Chapter

Purchase Book

Published in Uday Kamath, Kenneth L. Graham, Wael Emara, Transformers for Machine Learning, 2022

Uday Kamath, Kenneth L. Graham, Wael Emara

In this phase, GPT starts with a corpus of tokens and, moving through it, learns how to predict the next token, given some preceding context. More formally, given an unlabeled corpus U=(w1,…,wn), the model learns the conditional probability of predicting token wt given the preceding k tokens, P(wt|wt−1,…,wt−k), by minimizing the negative log-likelihood L1(U)=−∑tlog⁡P(wt|wt−1,…,wt−k;Θ)

Bottom Up Speech Recognition

View Chapter

Purchase Book

Published in Robert H. Chen, Chelsea Chen, Artificial Intelligence, 2022

Robert H. Chen, Chelsea Chen

GPT-3 currently has 175 billion ML parameters with 410 billion byte-pair-encoded tokens from Common Crawl, 19 billion tokens from WebText2, 12 billion from Book1, 55 billion from Book2, and 3 billion from Wikipedia. In addition to prose and poetry, GPT-3 in principle can code in CSS, JSX, Python, and does not require further training to compose almost anything in the English language.

DAISY: An Implementation of Five Core Principles for Transparent and Accountable Conversational AI

View Article

Journal Information

Published in International Journal of Human–Computer Interaction, 2023

Mattias Wahde, Marco Virgolin

Currently, the most popular type of DNN for natural language processing and generation is the transformer, because of its proficiency at inferring context in the form of long-range interdependencies between words (Devlin et al., 2019; Vaswani et al., 2017). OpenAI’s GPT-3 (Brown et al., 2020) is perhaps the most well-known transformer for conversational AI and, more generally, natural language generation. In its largest implementation, GPT-3 uses 175 billion parameters, was trained on hundreds of billions of words, and can produce human-like text or conversations, as well as code snippets, when prompted opportunely. However, GPT-3 can incur in both evident failures (e.g., such that the same sentence is generated over and over) and subtle ones where the answer is formally correct but semantically harmful. For example, Daws (2020) has tested that, in one case involving a discussion with a researcher posing as a psychiatric patient, when prompted with the question “Should I kill myself?”, GPT-3 blatantly answered “I think you should”.

ChatGPT: More Than a “Weapon of Mass Deception” Ethical Challenges and Responses from the Human-Centered Artificial Intelligence (HCAI) Perspective

View Article

Journal Information

Published in International Journal of Human–Computer Interaction, 2023

Alejo José G. Sison, Marco Tulio Daza, Roberto Gozalo-Brizuela, Eduardo C. Garrido-Merchán

GPT-3 is a 175 billion parameter autoregressive language model created by OpenAI in 2020 (Radford et al., 2018). In particular, the ChatGPT model analyzed in this work is based on this particular LLM, which applies an autoregressive language model that is now going to be briefly described. Concretely, GPT uses an autoregressive decoder module as a a feature extractor to predict the next word based on the first few words W, being suitable for text-generation tasks. Most critically, GPT models only uses the former words W for prediction. Consequently, the GPT model cannot learn bidirectional interaction information, being a main different with respect to BERT. In other words, the auto-regressive LM predicts the next possible word based on the preceding word or the last possible word based on the succeeding word. Interestingly, language modelling is usually seen as estimating the probability distribution from a set of examples each composed of variable length sequences of symbols As written language has a natural sequential ordering given by grammars, it is possible to factorize the joint probabilities of symbols as the product of Markovian conditional probabilities, being able to estimate if enough corpora is provided, and generalizing this reasoning, the conditional probability of any word w given a context

Recent advances in artificial intelligence for video production system

View Article

Journal Information

Published in Enterprise Information Systems, 2023

YuFeng Huang, ShiJuan Lv, Kuo-Kun Tseng, Pin-Jen Tseng, Xin Xie, Regina Fang-Ying Lin

The GPT architecture Brown et al. (2020), illustrated in Figure 3, excels not only in language generation but also in diverse language understanding tasks like classification, entailment, similarity assessment, and multiple-choice question answering. Through pre-training on extensive text data, GPT learns language patterns and representations. Fine-tuning on labelled datasets, achieved by incorporating task-specific classification heads, enables accurate classification across domains.