POMDP – Knowledge and References

Explore chapters and articles related to this topic

Cyber-resilience

Published in Stavros Shiaeles, Nicholas Kolokotronis, Internet of Things, Threats, Landscape, and Countermeasures, 2021

E. Bellini, G. Sargsyan, D. Kavallieros

Assumptions are system invariants and preconditions while guarantees are system post-conditions. Invariant contracts can be represented by deterministic Büchi automatons and by temporal logic [105]. However, most systems are at least partially non-deterministic. Moreover, invariant constructs are not well-matched with unknown and unexpected disruptions that might arise from unpredictable events. Hence, to address partial observability and need for adaptability and resilience, the research defined a mathematical construct called the “resilience contract,” which extends the concept of Contract-Based Design (CBD) to address uncertainty and partial observability that contribute to non-deterministic system behavior [106, 107]. In CBD, contracts are based on “assert-guarantee” and their implementation is satisfied if it fulfills guarantees when the assumptions are true. In this respect, the statements in the contract are mathematically verifiable. However, the invariance of the assertions limits the use of CBD approach in characterizing system reliance. With a resilience contract (RC), the assert-guarantee pair is replaced by a probabilistic “belief-reward” pair. This characterization affords the requisite flexibility while assuring probabilistic verifiability of the model. The RC is a hybrid modeling construct that combines invariant and flexible assertions and is represented as a Partially Observable Markov Decision Process (POMDP). A POMDP is a special form of a Markov Decision Process that includes unobservable states and state transitions. POMDPs introduce flexibility into a traditional contract by allowing incomplete specification of legal inputs and flexible definition of post-condition corrections.

Reinforcement Learning

View Chapter

Purchase Book

Published in Mark Chang, Artificial Intelligence for Drug Development, Precision Medicine, and Healthcare, 2020

Mark Chang

A POMDP is a generalization of a Markov decision process. In robotics, a POMDP models an agent decision process in which it is assumed that the system dynamics are determined by a MDP, but the agent cannot directly observe the underlying state. Instead, it must infer a distribution over the state based on a model of the world and some local observations. The framework originated in the operation research community and has been spread into artificial intelligence, automated planning communities, and the pharmaceutical industry (Lee, Chang, and Whitemore, 2008).

Cognitive Radio Ad Hoc and Sensor Networks

View Chapter

Purchase Book

Published in Mohamed Ibnkahla, Cooperative Cognitive Radio Networks, 2018

Mohamed Ibnkahla

The partially observable Markov decision process (POMDP) is a generalization of a Markov decision process (MDP). A POMDP models a decision process of a CR where the system dynamics are determined by an MDP, but the CR cannot directly observe the underlying state of a channel. Therefore, the POMDP is more practical than an MDP model when solving spectrum access problems.

Inspection and maintenance planning for offshore wind structural components: integrating fatigue failure criteria with Bayesian networks and Markov decision processes

View Article

Journal Information

Published in Structure and Infrastructure Engineering, 2022

Nandar Hlaing, Pablo G. Morato, Jannie S. Nielsen, Peyman Amirafshari, Athanasios Kolios, Philippe Rigo

In the following section, a brief description of partially observable Markov decision processes (POMDPs) is presented with its particular implementation in offshore wind I&M planning problem. A POMDP is a generalisation of a Markov decision process (MDP) in which the agent takes probabilistic actions in a stochastic environment and imperfect observations. In the 7-tuple process, the agent takes an action thereby transitioning the belief state according to the transition model The agent then receives an imperfect observation with the probability and also collects the reward for taking the action a.

Expectations for agents with goal-driven autonomy

View Article

Journal Information

Published in Journal of Experimental & Theoretical Artificial Intelligence, 2021

Dustin Dannenhauer, Héctor Muñoz-Avila, Michael T. Cox

Deterministic (STRIPS) planning assumes that actions have a predetermined outcome (Fikes & Nilsson, 1971). The result of planning is a sequence of actions that enable the agent to achieve its goals. A Markov Decision Process (MDP) is a frequently studied planning paradigm whereby actions have multiple outcomes (Howard, 1960). In MDPs, solutions are found by iterating over the possible outcomes until a policy is generated which indicates for every state that the agent might encounter, what action to take that will enable the agent to achieve its goals. A Partially Observable Markov Decision Process (POMDP) is an extension of MDP for planning when the states are partially observable (Kaelbling et al., 1998). In POMDPs, solutions are found by iterating over the possible states that the agent believes itself to be in and the possible outcomes of the actions taken on those states until a policy is found. The GDA framework is general allowing a variety of planning paradigms. GDA research has used both planning (Molineaux et al., 2010) and MDP-based planning (Jaidee et al., 2012). Also in regard to POMDPs, the goal-sensing problem doesn’t assume that the dynamics of the environment are known by the agent.

Practical POMDP-based test mechanism for quality assurance in volunteer crowdsourcing

View Article

Journal Information

Published in Enterprise Information Systems, 2019

Peng Shi, Wanyuan Wang, Yifeng Zhou, Jiuchuan Jiang, Yichuan Jiang, Zhifeng Hao, Jianyong Yu

POMDP (Partially Observable Markov Decision Processes) is a flexible model that makes sequential decisions under the uncertain scenario where the agent cannot directly observe its environmental states (Cassandra, Kaelbling, and Littman 1994; Rajpal, Goel, and Mausam 2015). However, observations caused by actions can supply the agent with useful information. The optimal policy of POMDP is to find an action function under observations, which can maximize the rewards in the overall decision processes. In our problem, when a worker performs a normal task with an unknown true answer, we cannot directly observe the current state of the worker. Thus, we adopt a typical POMDP scheme to model the sequential decision process and decide when to route a test or normal task according to the optimal policy.