Reinforcement learning – Knowledge and References

Explore chapters and articles related to this topic

Machine Learning Classifiers

Published in Rashmi Agrawal, Marcin Paprzycki, Neha Gupta, Big Data, IoT, and Machine Learning, 2020

Reinforcement learning (RL) combines the dynamic programming and supervised learning approaches to generate efficient machine-learning systems. It is goal-driven learning, where rewards are provided for every correct action and punishment for wrong ones. The machine then learns how to achieve that goal by trial-and-error interactions with its environment. During these interactions actions are performed that may be rewarded or penalised depending on whether they are correct or wrong, such that reward is maximised. This type of learning is the basis for many applications, like game playing, industrial simulations and for resource management applications. Reinforcement learning has a policy, a reward signal, a value function and a model of the environment as its major components (François-Lavet, Henderson et al. 2018). Policy defines a mapping from states to action. A reward signal indicates the award given if a correct step is taken, whereas value function determines the reward in the long run. The model of the environment depicts the behavior of the environment. It is a mapping from state-action to preferred states and rewards. Figure 1.3 shows the interaction of an agent with the environment by taking action and getting reward in return.

CTR Prediction Model

View Chapter

Purchase Book

Published in Peng Liu, Wang Chao, Computational Advertising, 2020

Liu Peng, Wang Chao

The main challenge of reinforcement learning is precisely how to strike the optimal balance between exploration and exploitation. In the advertisement, we need to sacrifice a part of the eCPM optimal strategy on the traffic and adopt a relatively random strategy to sample those feature Spaces with unknown effects, which is the exploration process. This process is then used to more effectively predict CTRs based on the overall flow of exploration and normal decision-making. Faced with A slot machine with A different expected revenue handle, players need to explore the highest revenue handle with as few chips as possible, and then use the result to obtain returns. This simple research problem of choosing 1 from A is also called MAB (multi-arm bandit) [12] problem. Let’s look at the mathematical description of the MAB problem.

Neural Networks for Autonomous Navigation on Non-Holonomic Mobile Robots

View Chapter

Purchase Book

Published in Nancy Arana Daniel, Alma Y. Alanis, Carlos Lopez Franco, Neural Networks for Robotics: An Engineering Perspective, 2019

Nancy Arana Daniel, Alma Y. Alanis, Carlos Lopez Franco

Reinforcement Learning is a strategy based on the interaction of a system or agent with its environment, allowing it to learn to perform a task automatically. RL defines a relation between situations and actions to maximize a numerical reward generated by the response of the environment. RL begins with a complete system that involves the environment and a definite goal [132]. The task usually is a series of actions that the robot has to perform to achieve its goal, then, the mission of the learner is to find the action rules (policies) to optimally achieve a certain goal through its interaction with the environment. What distinguishes RL from other learning methods is that information used to train the system is obtained through evaluation of the results obtained by the actions taken. This requires an active exploration and a trial and error search approach to find the best performance that determines how good is an action taken or what is the best course of action in a given situation [127,132].

Wireless Network Design Optimization for Computer Teaching with Deep Reinforcement Learning Application

View Article

Journal Information

Published in Applied Artificial Intelligence, 2023

Yumei Luo, Deyu Zhang

Reinforcement learning is a branch of machine learning that focuses on how to act based on feedback from the environment in order to achieve the desired benefit. In psychology, it is based on the theory of behaviorism, which explains how organisms gradually create expectation of stimuli under the influence of rewards or punishments and develop regular behaviors that maximize their benefits. There are several components to the reinforcement learning model, the most basic of which being a set of states in the environment, a set of actions, rules for transitions between states, and immediate rewards following state transitions. The subject and environment of reinforcement learning interact at discrete time steps, and at each time, the subject observes a corresponding piece of information. It usually contains reward information in it, and then it selects an action from a set of actions. Executed in the environment, the environment transitions to a new state, and then gets a reward associated with this transition. The goal of reinforcement learning agents is to get as many rewards as possible. The power of reinforcement learning comes from two aspects, one is to use the past experience of the subject as a sample to optimize the behavior, and the other is to use the function approximation to simulate the complex system environment. Therefore, reinforcement learning methods are universal and have been studied in many other fields.

Employing reinforcement learning to enhance particle swarm optimization methods

View Article

Journal Information

Published in Engineering Optimization, 2022

Di Wu, G. Gary Wang

Reinforcement learning is to determine how agents take actions in an environment, in order to maximize the cumulative reward (Kaelbling, Littman, and Moore 1996). In the standard reinforcement learning model, an agent is connected to its environment via perception and action. At each time step, the agent observes the current state of the environment and then takes actions to a new state. The environment will provide a reward or punishment for this action. After all actions are taken, a score can be generated by summation of the reward or punishment provided for each action in the entire process, where a sequence of the actions is generated from a programmed policy. The goal of reinforcement learning is to generate the optimal policy for the agent to maximize the score of this action.

Spatial arrangement using deep reinforcement learning to minimise rearrangement in ship block stockyards

View Article

Journal Information

Published in International Journal of Production Research, 2020

Byeongseop Kim, Yongkuk Jeong, Jong Gye Shin

Reinforcement learning is a method in which an agent learns behaviour by trial and error in a dynamic environment, and it has been used for a long time in the machine learning and artificial intelligence fields (Kaelbling, Littman, and Moore 1996). The history of reinforcement learning is long, but in 2013, deep reinforcement learning research developed rapidly, beginning with the Q-learning algorithm, which is based on deep learning technology (Mnih et al. 2013). Mnih et al. (2015) developed an algorithm called deep Q-network (DQN) and trained an agent that exhibits levels of performance similar to those of humans in the simple computer game. In 2016, the AlphaGo used reinforcement learning that it performed autonomously, supervised learning in which it learned the games of Go players, and the Monte Carlo tree search (MCTS), becoming the first program to beat the world champion in Go (Silver et al. 2016). In addition, AlphaGo Zero, which was developed in 2017, showed vastly better performance than the existing the AlphaGo by using only reinforcement learning and MTCS without supervised learning (Silver et al. 2017). Most recently, the AlphaStar was developed by applying reinforcement learning and supervised learning to the StarCraft II and which was rated above 99.8% of officially ranked human players (Vinyals et al. 2019).