AlphaGo Zero – Knowledge and References

Explore chapters and articles related to this topic

A Brief History of Artificial Intelligence

Published in Ron Fulbright, Democratization of Expertise, 2020

As a case in point, in 2016, Google’s AlphaGo, developed by DeepMind Technologies (subsequently purchased by Google), defeated the reigning world champion in Go (DeepMind, 2018a). In 2017, an even stronger version called AlphaGo Master won 60 online games against professional human players over a one-week period. Also in 2017, a version called AlphaGo Zero learned how to play Go by playing games with itself and not relying on any data from human games (DeepMind, 2018b). A generalized version called AlphaZero was developed in 2017 capable of learning any game. While Watson required many person-years of engineering effort to program and teach the craft of Jeopardy, AlphaZero achieved expert-level performance in the games of Chess, Go, and Shogi after only a few hours of unsupervised self-training.

How to Untangle Complex Systems?

View Chapter

Purchase Book

Published in Pier Luigi Gentili, Untangling Complex Systems, 2018

Pier Luigi Gentili

There are two main strategies to develop artificial intelligence: one is writing human-like intelligent programs running on computers or special-purpose hardware, and the other is neuromorphic engineering. For the first strategy, computer scientists are writing algorithms that can learn, analyze extensive data, and recognize patterns. At the same time, psychologists, biologists, and social scientists are giving information on human sensations, emotions, and intuitions. The merger of the two contributions provides algorithms that can easily communicate with us. Among the most promising algorithms, there are the artificial neural networks (remember Chapter 10, when we learned these algorithms for predicting chaotic time series) (Castelvecchi 2016). Recently, a program called AlphaGo Zero, based on an artificial neural network that learns through trial and error, has mastered the game of Go without any human data or guidance, and it has outperformed the skills of the best human players (Silver et al., 2017).

Reinforcement Learning

View Chapter

Purchase Book

Published in Mark Chang, Artificial Intelligence for Drug Development, Precision Medicine, and Healthcare, 2020

Mark Chang

AlphaGo Zero, a version created without using data from human games, is stronger than any previous version. By playing games against itself, AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reaching the level of AlphaGo Master in 21 days. The next version, AlphaZero, within 24 hours achieved a superhuman level of play in three games by defeating world-champion programs, Stockfish, Elmo, and the three-day version of AlphaGo Zero. In each case it made use of custom tensor processing units (TPUs) that the Google programs were optimized to use.

Routeview: an intelligent route planning system for ships sailing through Arctic ice zones based on big Earth data

View Article

Journal Information

Published in International Journal of Digital Earth, 2022

Adan Wu, Tao Che, Xin Li, Xiaowen Zhu

In contrast to these traditional methods, RL enables an agent to learn autonomously in an interactive environment by trial and error using feedback from its own actions and experiences. The goal is to find a suitable action model that maximizes the total cumulative reward of the agent. Mnih et al. (2015) believes that combining RL with pathfinding is equivalent to allowing agents to combine learning abilities similar to the way humans find solutions. In 2006, the proposal of deep learning (Hinton and Salakhutdinov 2006) successfully promoted the vigorous development of deep RL. The core idea is to consider complex environmental characteristics based on the powerful perceptual ability provided by deep learning and to implement an intelligent decision-making analysis process that combines RL and environmental interactions. In 2015, Google's DeepMind team launched the deep Q-network (DQN) and noted that deep RL has reached a decision analysis level commensurate with that of humans (Volodymyr et al. 2015). Two years later, the team launched AlphaGo and defeated Li Shishi, the world champion of Go. After that, AlphaGo Zero based on deep RL defeated AlphaGo after a short period of training without aid based on human experience (Silver et al. 2017).

Explainable reinforcement learning in production control of job shop manufacturing system

View Article

Journal Information

Published in International Journal of Production Research, 2021

Andreas Kuhnle, Marvin Carl May, Louis Schäfer, Gisela Lanza

The application of RL to solve the underlying problems of various games is seen as the main reason for its popularity. In many cases, RL-agents exceed the best performance of human players. Among the most impressive applications are AlphaGo (Silver et al. 2017) and AlphaGo Zero (Silver et al. 2018). AlphaGo develops a strategy based on numerous human game turns of Go and already achieves superhuman results. AlphaGo Zero is more advanced and develops a strategy without any prior knowledge, based only on the rules of the game and RL-based self-play. The resulting performance is unmatched to date.