TD-Gammon

Explore chapters and articles related to this topic

Building Applications

Published in James Luke, David Porter, Padmanabhan Santhanam, Beyond Algorithms, 2022

James Luke, David Porter, Padmanabhan Santhanam

We just want to highlight the key technology advancement that resulted from these gaming applications. Arthur Samuel [1] introduced the phrase “Machine Learning” in the 1950s to explain how his program playing Checkers learnt the game. It took a little while, till the early 1990s when Gerald Tesauro created TD-Gammon [2], which learnt by playing against itself (using reinforcement learning) to reach championship level in Backgammon. Then came Deep Blue [3] from IBM in 1997 that captured the imagination of both AI researchers and Chess fans by beating world chess champion Garry Kasparov in a regulation match. This system used both custom hardware and AI learning from historical games and player styles. In 2016, the AlphaGo system [4] from Google Deep Mind that uses DNNs convincingly beat the world champion Lee Sedol in the Chinese board game Go. The current version of this system called AlphaGo Zero is even better, with no need for human input, beyond the game rules. The low risk in game playing gives those developing AI for games a massive advantage over those developing real-world AI at a time when the virtual world and real worlds are converging. As simulation games become more and more realistic, we should not be surprised to see AI developed initially in games being applied in the real world.

A Brief History of Artificial Intelligence

View Chapter

Purchase Book

Published in Ron Fulbright, Democratization of Expertise, 2020

Ron Fulbright

In 1979, a backgammon-playing computer program developed by Hans Berliner called BKG defeated the reigning world champion (Berliner, 1980). BKG was the first computer program to defeat a human world champion in any board game (Berliner, 1977). Backgammon is different from many board games like checkers and chess because the roll of the dice injects a random factor at each move. In 1992, Gerald Tesauro developed TD-Gammon using an artificial neural network trained using temporal-difference learning (hence the ‘TD’ in the name) also known as Q-Learning. TD-Gammon was able to rival, but not consistently surpass, the abilities of top human backgammon players (Tesauro, 1995).

Reinforcement Learning

View Chapter

Purchase Book

Published in Stephen Marsland, Machine Learning, 2014

Stephen Marsland

A famous example of reinforcement learning was TD-Gammon, which was produced by Gerald Tesauro. His idea was that reinforcement learning should be very good at learning to play games, because games were clearly episodic—you played until somebody won—and there was a clear reward structure, with a positive reward for winning. There was another benefit, which was that you could set the learner to play against itself. This is actually very important, since the version of TD-Gammon that was actually bundled with the IBM operating system OS/2 Warp had played 1,500,000 games against itself before it stopped improving.

Performance comparison of different momentum techniques on deep reinforcement learning^*

View Article

Journal Information

Published in Journal of Information and Telecommunication, 2018

M. Sarigül, M. Avci

In this work, Othello game agents were trained in online reinforcement learning methodology. First of all, a deep learning network structure with random weights was established. The agent played the game by giving the current state of the board as input to this network and choosing the action that has the highest expected return according to the output values of the network. The opponent plays by selecting random moves in any state. This increased the sparsity of the data. Only the data from last 100 games were used in each iteration for the training. After each iteration, the network was trained for a batch containing 100 different positions and the same network was continued to be used in the simulations. Online training may fail if the sparsity of the data is not provided in such applications. Othello game was chosen for the experiment because visited states are not repeated during the game. Each player has to play forward. This feature of the game prevents the agent to learn a defensive optimal strategy which lets the agent circulate within the same states by selecting same actions. The agent is forced to travel within the state space during the play. This had resulted in a more accurate learning because it increases the diversity of selected actions. Tesauros TD-gammon (Tesauro, 1995) is the most known application which includes a neural network trained by self-play. TD-gammon was successful because of the fact that backgammon has a high rate of stochasticity. By throwing dice for each play, it is provided that the algorithm travels in all around the state-action space. In contrary, the trained network may find an allegedly optimal policy and diverge in deterministic games. Despite this fact, a policy network was successfully trained by only self-play during the experiment for the Othello game.

Explore chapters and articles related to this topic

Building Applications

A Brief History of Artificial Intelligence

Reinforcement Learning

Performance comparison of different momentum techniques on deep reinforcement learning*

Performance comparison of different momentum techniques on deep reinforcement learning^*