Deep reinforcement learning – Knowledge and References

Explore chapters and articles related to this topic

Overview of Reinforcement Learning

Published in Chong Li, Meikang Qiu, Reinforcement Learning for Cyber-Physical Systems, 2019

As will be discussed in later chapters, the traditional reinforcement learning approaches always suffer from the curse of dimensionality. Consequently, they are inherently limited to fairly low-dimensional problems. Rising in recent years, however, deep reinforcement learning algorithms allow solving complex problems with high-dimensionality. As the word deep suggests, there are multiple layers in the end-to-end training process. Deep reinforcement learning can be viewed as a combination of deep neural network and reinforcement learning. That is, use deep learning algorithms within reinforcement learning via function approximation and representation learning properties of deep neural networks. For example, we want to design a robot that will drive away the birds from a corn field and run away to warn human workers in that facility. On the input side, video is fed into a learning algorithm in the robot. The video containing high-dimensional pixels in each frame first goes through several layers of manipulation in a neural network, extracting the low-dimensional key features of video frames. Based on that, the robot applies reinforcement learning to decide whether to engage the object or run away from it. As it is shown, this kind of endto-end learning involving high dimensional data poses astronomical computation complexity, which can be dealt with using deep learning. A nice survey on deep reinforcement learning [6] covers a variety of deep reinforcement learning algorithms including the deep Q-network, trust region policy optimization, and asynchronous advantage actor-critic.

Challenges and Broader Perspectives

View Chapter

Purchase Book

Published in F. Richard Yu, Tao Huang, Garima Ameta, Yunjie Liu, Integrated Networking, Caching, and Computing, 2018

F. Richard Yu, Tao Huang, Garima Ameta, Yunjie Liu

When we jointly consider the three technologies: networking, caching and computing, system complexity could be very high. Deep reinforcement learning is a recently emerging technology which integrates deep learning with a reinforcement learning algorithm to handle a large amount of input data and obtain the best policy for tough problems. Deep reinforcement learning utilizes a deep Q‐network to approximate the Q value function [28]. Google Deepmind adopts this method on some games [28,29], and gets quite good results. Furthermore, deep reinforcement learning has been explored in various wireless networks[30]. For an integrated system of networking, caching, and computing, deep reinforcement learning can be used as a powerful tool to obtain good resource allocation policies [31].

Machine Learning

View Chapter

Purchase Book

Published in Seyedeh Leili Mirtaheri, Reza Shahbazian, Machine Learning Theory to Applications, 2022

Seyedeh Leili Mirtaheri, Reza Shahbazian

Deep reinforcement learning is one the hot topics in the last few years. Deep reinforcement learning combines artificial neural networks with a framework of reinforcement learning that helps software agents learn how to reach their goals. That is, it unites function approximation and target optimization, mapping states and actions to the rewards they lead to. While neural networks are responsible for recent AI breakthroughs in problems like computer vision, machine translation and time series prediction–they can also be combined with reinforcement learning algorithms to create something astounding like Deepmind’s AlphaGo, an algorithm that beat the world champions of the Go board game. Also, the reinforcement learning algorithms that incorporate deep neural networks can beat human experts playing numerous Atari video games, Starcraft II and Dota-2. That is why the world focuses on deep reinforcement learning. Reinforcement learning refers to goal oriented algorithms, which learn how to achieve a complex objective or goal. In other words, the aim of reinforcement learning is find out how to maximize a cost function over many steps. For example, they can maximize the points won in a game over many moves. Reinforcement learning algorithms can start from a blank slate, and under the right conditions, achieve superhuman performance. Let’s take an example and explain details of reinforcement learning in this example. Consider how a child learns to walk. In the first step, the baby should learn how to stand on his or her feet. In the first try, the baby stands and immediately falls to the ground. This hurts the baby and punishes the baby. Therefore, the baby learns from his or her first try and uses this experience in the second try. In the second try, he or she can stand for a few second and the parents encourage the baby. This a reward for the baby. The baby tries based on these rewards and punishments until he or she can walk without falling to the ground. The reinforcement learning basic idea is very similar to how the baby learns to walk. A set of rewards and punishments are defined for the model and the model should find the best weights that maximize the rewards and minimize the punishments. In other words, the reinforcement learning model should maximize or minimize a cost function which these rewards and punishments to be included in it.

Low carbon design strategies for landscape architecture based on renewable energy technologies

View Article

Journal Information

Published in Intelligent Buildings International, 2023

Fanwei Meng, Huaiyuan Yu

In the supply of energy systems for landscape buildings, optimizing the efficiency and stability of renewable energy supply is the key to their decarbonization. Therefore, in order to further optimize the energy system of landscape buildings, the study responds by introducing deep reinforcement learning based on the improved NSGA-II algorithm. Deep reinforcement learning is a combination of reinforcement learning and deep learning as an artificial intelligence algorithmic technique with powerful representational and decision-making capabilities, enabling intelligent decision-making in state space and high-dimensional space. Deep reinforcement learning tasks are described by Markov decision making, which has excellent performance in describing multi-stage decision problems in unknown environments (Dong, Jia, and Liu 2018). The interaction process between the environment and the intelligences in Markov decision making is shown in Figure 3.

Spatial arrangement using deep reinforcement learning to minimise rearrangement in ship block stockyards

View Article

Journal Information

Published in International Journal of Production Research, 2020

Byeongseop Kim, Yongkuk Jeong, Jong Gye Shin

Reinforcement learning is a method in which an agent learns behaviour by trial and error in a dynamic environment, and it has been used for a long time in the machine learning and artificial intelligence fields (Kaelbling, Littman, and Moore 1996). The history of reinforcement learning is long, but in 2013, deep reinforcement learning research developed rapidly, beginning with the Q-learning algorithm, which is based on deep learning technology (Mnih et al. 2013). Mnih et al. (2015) developed an algorithm called deep Q-network (DQN) and trained an agent that exhibits levels of performance similar to those of humans in the simple computer game. In 2016, the AlphaGo used reinforcement learning that it performed autonomously, supervised learning in which it learned the games of Go players, and the Monte Carlo tree search (MCTS), becoming the first program to beat the world champion in Go (Silver et al. 2016). In addition, AlphaGo Zero, which was developed in 2017, showed vastly better performance than the existing the AlphaGo by using only reinforcement learning and MTCS without supervised learning (Silver et al. 2017). Most recently, the AlphaStar was developed by applying reinforcement learning and supervised learning to the StarCraft II and which was rated above 99.8% of officially ranked human players (Vinyals et al. 2019).

Planning and acting in dynamic environments: identifying and avoiding dangerous situations

View Article

Journal Information

Published in Journal of Experimental & Theoretical Artificial Intelligence, 2022

Lukáš Chrpa, Martin Pilát, Jakub Gemrot

This paper addresses another kind of uncertainty in real-wold scenarios, that is that the environment is rarely static. It means that non-deterministic exogenous events can occur and change the environment without the consent of the agent. Specifically, acts of the nature, or even acts of an intelligent adversary, affect plans of the agent. The concept of planning with events is not new (Dean & Wellman, 1990). Fully deliberative reasoning in such dynamic environments is practically feasible only for very small problems. Systems such as Circa (Musliner et al., 1993) employ the concept, however, they can effectively reason in very small state spaces. Similarly an approach leveraging FOND planning is feasible for small state spaces (Chrpa et al., 2019). Fully reactive reasoning such as that employing deep reinforcement learning has shown impressive results even in very dynamic environments such as Atari Games (Mnih et al., 2015), the game of Go (Silver et al., 2016), or the game of Starcraft (Vinyals et al., 2019). However, in problems, where more complex plans or policies are required to achieve longer-term goals, such as the Montezuma’s Revenge game, deep reinforcement learning is not very efficient (Mnih et al., 2015). Hence, deliberative reasoning seems to be essential in such problems (Cerný et al., 2016; Lipovetzky et al., 2015). Markov decision process (MDP)-based approaches consider events (Mausam & Kolobov, 2012) while providing the most promising action in a given state, which is closer to the reactive behaviour. Monte-Carlo Tree Search (MCTS) approaches provide similar benefits, however, they are not very successful in problems with dead-ends (Patra et al., 2019).