Multi-agent reinforcement learning – Knowledge and References

Explore chapters and articles related to this topic

Expert Systems for Microgrids

Published in KTM Udayanga Hemapala, MK Perera, Smart Microgrid Systems, 2023

Single agent reinforcement learning (SARL) is associated with introducing a single learning agent to the system. Here, the intelligent agent individually carries out the learning process. Multi-agent reinforcement learning (MARL) involves more than one intelligent agent in the system. Here, the RL agents learn while cooperatively optimizing the system. They track the states in which agents cannot independently take actions and have to consider the decisions of fellow agents. They follow a strategy in which agents sort the states where collisions may occur and mark those states as “dangerous” states and other states as “safe” states. The general RL model for the multi-agent case is as in Figure 6.8.

A Multi-Agent Reinforcement Learning Approach for Spatiotemporal Sensing Application in Precision Agriculture

View Chapter

Purchase Book

Published in Ketan Kotecha, Satish Kumar, Arunkumar Bongale, R. Suresh, Industry 4.0 in Small and Medium-Sized Enterprises (SMEs), 2022

T. A. Tamba

The previously described RL scheme only describes the process of decision-making for one agent. When the decision-making process involves several agents simultaneously, a generalisation of the single-agent RL method within the framework of multi-agent reinforcement learning is needed.

Dynamic Graphical Games

View Chapter

Purchase Book

Published in Magdi S Mahmoud, Multiagent Systems, 2020

Magdi S Mahmoud

Reinforcement Learning (RL) is one field of Machine Learning [560, 561]. It is employed to find the optimal control solutions for dynamical systems in [561, 562]. The RL approaches are designed to select the policies that minimize the objective function in dynamic learning environments [560, 561]. The RL approaches are implemented using two-step approaches, known as value and policy iteration techniques [563, 564, 565, 566]. Value and policy iteration solutions are developed for multi-agent systems formulating graphical games in [547], [567]. RL approaches are used in [568] to implement the approximate Dual Heuristic Programming (DHP) solutions for graphical games. The reward shaping is used to direct the agent’s exploration by adding additional rewards to those obtained in the learning environment [569]. It is shown that the potential based reward shaping does not change the true Pareto front in the single and multi-objective RL solutions [569]. Integral Reinforcement Learning (IRL) is developed in order to solve the optimal control problem for a single agent system in [566]. An IRL-H∞-based controller is developed for a flux-switching permanent magnet (FSPM) machine in a hostile environment in [570]. An IRL-based automatic voltage regulator for power systems is developed in [571]. This controller does not need to know the full dynamics of the model. An integral Q-Learning load frequency controller is developed for the power systems in [572]. An off-policy IRL optimal control tracking algorithm is proposed for a Lorenz chaotic system in [573]. Another off-policy RL control algorithm is developed for a rotational/translational actuator nonlinear benchmark problem in [574]. A similar algorithm is developed for a two-link manipulator in [575]. An IRL approach is proposed to solve a nonlinear optimal control problem with input-affine dynamics in [576]. Multi-Agent Reinforcement Learning (MARL) techniques have gained interest in industrial applications like robotic assembly lines, resource allocation and management, data mining, and decision support systems [577, 578, 579, 580]. MARL approaches have been developed for discrete-time systems in [581, 582]. The convergence for each node relies on the convergence of all the other nodes simultaneously. An off-policy RL algorithm is proposed to solve a cooperative control problem using a game’s theoretic frame-work in [583], where a behavioral policy is used for learning purposes. A residual gradient Fuzzy-RL approach is used to solve the pursuit-evasion games in [584], where it outperformed the Q-learning solutions [584].

Multi-agent deep reinforcement learning with traffic flow for traffic signal control

View Article

Journal Information

Published in Journal of Control and Decision, 2023

Liang Hou, Dailin Huang, Jie Cao, Jialin Ma

Although deep learning expands RL's application scope, DRL still has limitations for the large-scale road network. The agent needs to observe the traffic measurement of all intersections, which will lead to the dimensional explosion of action space and high delay time. It does not meet the requirements of ATSC. Multi-Agent Reinforcement Learning (MARL) has excellent decision-making ability and generalisation ability. In addition, it can coordinate each agent to obtain the global optimum. so it is very effective for MARL to solve ATSC problems in the large-scale network. An agent can control one intersection and use the cooperation of all agents to achieve global optimisation. MARL combines RL with the multi-agent system and has a long history of development. The most basic MARL algorithm is independent Q-learning (IQL). Each agent in the system adopts Q-learning, and IQL has good scalability. However, there is no communication between agents. In that case, the environment will become unstable when other agents change their policies, so it will have better results in some scenarios where the coupling is weak or even no coupling.

Learning adversarial policy in multiple scenes environment via multi-agent reinforcement learning

View Article

Journal Information

Published in Connection Science, 2021

Yang Li, Xinzhi Wang, Wei Wang, Zhenyu Zhang, Jianshu Wang, Xiangfeng Luo, Shaorong Xie

This section reviews the current related work on multi-agent reinforcement learning. In conjunction with the needs in adversarial strategy modelling, multi-agent reinforcement learning methods can be divided into three main categories, including multi-agent reinforcement learning with centralised learning, multi-agent reinforcement learning with decentralised learning, and multi-agent reinforcement learning with distributed learning. The adversarial strategy modelling method proposed in this paper belongs to both decentralised learning and distributed learning. It comprehensively uses and improves the single-agent reinforcement learning algorithm, adopts decentralised learning for multi-agent adversarial strategy modelling and uses a multi-scene distributed learning approach for model training, which significantly improves the model performance and learning speed.

When Does Communication Learning Need Hierarchical Multi-Agent Deep Reinforcement Learning

View Article

Journal Information

Published in Cybernetics and Systems, 2019

Marie Ossenkopf, Mackenzie Jorgensen, Kurt Geihs

Multi-agent reinforcement learning systems often suffer from the non-stationary environment problem. This issue makes it difficult for some frameworks to learn beneficial policies when the rewards are delayed and sparse. Tang et al. developed frameworks (Ind-hDQN, hCom and hQmix) that could tackle these challenges by using a hierarchical structure (2018). The learning difficulty is lessened because of the hierarchical structure of the framework. They utilize Augmented Concurrent Experience Replay (ACER) which is an experience replay mechanism. Tang et al. argue that a way to reduce multi-agent coordination complexity is by using temporal abstraction.