RL – Knowledge and References

Explore chapters and articles related to this topic

Machine Learning-Based Optimal Consensus Networked Control with Application to Van der Pol Oscillator Systems

Published in Nishu Gupta, Srinivas Kiran Gottapu, Rakesh Nayak, Anil Kumar Gupta, Mohammad Derawi, Jayden Khakurel, Human-Machine Interaction and IoT Applications for a Smarter World, 2023

Luy Nguyen Tan

The objective of optimal control for a nonlinear system is to seek optimal control strategies deriving from the Pontryagin maximum principle. However, the principle may be only a necessary condition. It is required that a sufficient condition of Hamilton-Jacobi-Bellman (HJB). Unfortunately, there do not exist analytical HJB solutions [13, 14] due to differential nonlinear. Recently, reinforcement learning (RL) techniques [14], a basic core of the machine learning theory [15], has emerged as one of the most well-known methods being employed to approximate the HJB solutions for MAS problems [16]. For example, RL can learn the solutions of differential, stochastic, and Markov games, two-player or multiplayer games, or Nash Q-learning. A branch of RL, adaptive dynamic programming (ADP) [5], is widely researched for optimal control designs [5, 17–24]. Inspired by naturally behavioral psychology, almost ADP-based algorithms use policy iteration techniques, which use two or three neural networks (NN) for control structures [17–24], called actor-disturber-critic (ADC). The critic approximates a value function while the others tune the optimal control strategy and disturbance rejection policy via the critic. The NN-weight training process is executed in two sequential steps: evaluation of actor's policy and improvement of disturber's policy. Unfortunately, with the ADC structure, the training algorithms have the disadvantages of sequential update, and hence, they require stabilized initial weights [18].

Modern Energy Recovery from Renewable Landfill or Bio-Covers of Landfills

View Chapter

Purchase Book

Published in Sunil Kumar, Zengqiang Zhang, Mukesh Kumar Awasthi, Ronghua Li, Biological Processing of Solid Waste, 2019

Rena, Gautam Pratibha, Sunil Kumar

Phase three is ideal for material excavation. The basic idea of RL is to reuse the material and decrease the landfill volume to decrease the amount of waste at a landfill. This is also very helpful in speeding up the degradation process (Benson et al., 2007; Kai et al., 2008; Reinhart and Al-Yousfi, 1996; Sang et al., 2012; Valencia et al., 2009; White et al., 2011). Introduction of nutrient elements also enhances the microbial process (Qingshan et al., 1996). Injection of air into landfill maintains aeration (Marlies et al., 2013; Raga and Cossu, 2014; Rich et al., 2008; Ritzkowski and Stegmann, 2013; Pleasant et al., 2014). Addition of water and leachate circulation can balance the pH level and enhance the water content of waste. This ultimately boosts the physical, chemical, and biological degradation processes, which is essential for degradation process settlement (Benson et al., 2007; Valencia et al., 2009) NPK salts (0.24% w/w), i.e., KCl, KH2PO4, and NH4CO3, can also be used in order to stabilize the waste. They adjust and balance the nutrients level of C/N/P (100–150):5:1. Among all other compounds, KH2PO4 is the best one for speeding up biodegradation (Qingshan et al., 1996). Injection of air actively or passively leads to surface settlement in RL in comparison to CL under the same condition. It results in the availability of more space for the waste (Reinhart and Al-Yousfi, 1996; Sang et al., 2008). As the temperature increases due to the various major chemical activities, the leachate quantity decreases.

Applications of Machine Learning in Wireless Communication: 5G and Beyond

View Chapter

Purchase Book

Published in Mangesh M. Ghonge, Ramchandra Sharad Mangrulkar, Pradip M. Jawandhiya, Nitin Goje, Future Trends in 5G and 6G, 2021

Rohini Devnikar, Vaibhav Hendre

Reinforcement learning (RL) mainly focuses on making a selection that might be generated by mapping the situations to action and comparing which moves are needed to be considered for maximizing an extended time reward. RL is used in making real-time decisions, robot navigation, game AI, and skill acquisition. RL is also used in wireless communication using historical-based RL algorithm in vehicular network [21], deep RL algorithm used in cloud RANs and D2D network in communication.

Routeview: an intelligent route planning system for ships sailing through Arctic ice zones based on big Earth data

View Article

Journal Information

Published in International Journal of Digital Earth, 2022

Adan Wu, Tao Che, Xin Li, Xiaowen Zhu

In contrast to these traditional methods, RL enables an agent to learn autonomously in an interactive environment by trial and error using feedback from its own actions and experiences. The goal is to find a suitable action model that maximizes the total cumulative reward of the agent. Mnih et al. (2015) believes that combining RL with pathfinding is equivalent to allowing agents to combine learning abilities similar to the way humans find solutions. In 2006, the proposal of deep learning (Hinton and Salakhutdinov 2006) successfully promoted the vigorous development of deep RL. The core idea is to consider complex environmental characteristics based on the powerful perceptual ability provided by deep learning and to implement an intelligent decision-making analysis process that combines RL and environmental interactions. In 2015, Google's DeepMind team launched the deep Q-network (DQN) and noted that deep RL has reached a decision analysis level commensurate with that of humans (Volodymyr et al. 2015). Two years later, the team launched AlphaGo and defeated Li Shishi, the world champion of Go. After that, AlphaGo Zero based on deep RL defeated AlphaGo after a short period of training without aid based on human experience (Silver et al. 2017).

Combining symbiotic simulation systems with enterprise data storage systems for real-time decision-making

View Article

Journal Information

Published in Enterprise Information Systems, 2021

B. S. Onggo, Canan G. Corlu, Angel A. Juan, Thomas Monks, Rocio de la Torre

An exciting, but challenging area of research that holds great potential for S3 is reinforcement learning (RL). When RL is implemented using neural network approaches, it is called deep RL. In RL, an agent takes actions within an environment, and learns the value of its actions via (delayed) feedback, which may consist of observations and a reward. RL approaches, such as tabular Q-learning and deep Q-networks (DQN) are suitable for stochastic environments that would be found in S3, with DQN being suitable for problems with more states. Unlike in a meta-modelling approach, where the design of experiments is employed, an agent starts with no information and steps forward in time. The agent takes actions that balance the exploitation of promising actions already tested, , and exploration of actions where it has limited experience. RL agents are often initially trained in simulated environments – making it ideal for S3 – and then used with the physical system. Over time, an agent would also receive feedback from the physical system, refining its internal estimates of the value of actions given a state. For instance, in Q-Learning this is done by blending old and new values of the quality of an action in a given state. A particular challenge in RL is defining reward values. In an example of a production facility, Creighton and Nahavandi (2002) defined a reward based on inventory storage, set-up, and production costs. RL is a highly active research area within artificial intelligence, and there is great opportunity to transfer the gains across to S3.

Learning-based traffic signal control algorithms with neighborhood information sharing: An application for sustainable mobility

View Article

Journal Information

Published in Journal of Intelligent Transportation Systems, 2018

H. M. Abdul Aziz, Feng Zhu, Satish V. Ukkusuri

Since traffic environment is inherently dynamic and changes over time, there is a scope to learn for its elements (e.g., signal controllers) through interaction with the environment. Later, controllers can adjust the actions towards the desired state of the system. Among different learning techniques, reinforcement learning (RL) is one of the widely used control techniques applied to solve the traffic signal control. In RL-based schemes, the agent (i.e., signal controller) learns from interacting with the environment, which is often modeled as Markov Decision Process (MDP). The ability to learn from the environment and scalability are the key advantages of RL in terms of implementation because no direct optimization is generally involved. The interactive nature of reinforcement learning algorithms requires a communication interface where the agents (vehicles and controllers) can send and receive information among each other. This fits well into the paradigm of connected and automated vehicles in transportation networks.