Stochastic dynamic programming – Knowledge and References

Explore chapters and articles related to this topic

Nonmyopic Sensor Management

Published in David L. Hall, Chee-Yee Chong, James Llinas, Martin Liggins, Distributed Data Fusion for Network-Centric Operations, 2013

Viswanath Avasarala, Tracy Mullen

In recent years, stochastic dynamic programming approaches have been applied to sensor management problems. In 1995, Castañon considered a multigrid, single sensor detection problem. Under certain assumptions about the target distributions and probability distribution of sensor measurements, Castañon solved the problem to optimality. The optimal allocation policy was to search either of the two most likely target locations during each round of scheduling. Castañon (1997) further demonstrated swing a simulation study that this optimal policy outperforms a greedy information-theoretic approach. However, except for trivial cases, solving stochastic dynamic programming to optimality is not possible because of their complexity. As a result, more generally, researchers have used approximation techniques to solve stochastic dynamic programming problems. Researchers have used various approximation methods for solving the sensor management problem that is concerned with tracking targets. Washburn et al. (2002) formulate a single-sensor, multitarget scheduling problem as a stochastic scheduling problem and use the Gittin’s index rule to develop approximate solutions. Williams et al. (2005) consider a single-target, multisensor allocation problem with communication constraints and use adaptive Lagrangian relaxation to solve the constrained dynamic programming problem. Schneider et al. (2006) have used approximate dynamic programming to allocate gimbaled radars for detecting and tracking tracks over a multihorizon time period. As explained earlier, these more complex stochastic dynamic programming problems have been solved used using greedy approximations.

Optimal Multiobjective Multireservoir Operation Using Interactive Decision Making by Parametric Approach

View Chapter

Purchase Book

Published in Surendra Kumar Chandniha, Anil Kumar Lohani, Gopal Krishan, Ajay Krishna Prabhakar, Advances in Hydrology and Climate Change, 2023

R. U. Kamodkar, J. B. Gurav, D. G. Regulwar

Moorthi et al. (2018) discussed water resource regulation based on fuzzy “if-then” rules that link concepts of interpolative reasoning, logical implications, and some inference tools to infer knowledge about a water resource system using linguistic descriptions. Moeini et al. (2011) implemented a fuzzy rule-based model for hydropower reservoir operations. The proposed fuzzy rule-based model provides a set of appropriate operating rules based on optimal or target storage levels for release from the reservoir. The infeasible conditions and curse of dimensionality are two significant challenges in the case of stochastic dynamic programming method. Saadat and Asghari (2017) have suggested a new approach for avoiding unfeasible conditions and increasing the performance of the solution with modern discretization technique. To tackle with, an optimization module is integrated into the standard structure of stochastic dynamic programming (SDP), so that the near optimum values of the state variables are calculated based on the constraints available. Sangeeta and Mujumdar (2015) developed a fuzzy stochastic dynamic programming (FSDP) model which considered the inflow in the reservoir as a stochastic variable while reservoir storage and soil moisture are treated as a fuzzy numbers. The objective of the study is to minimize the crop yield deficits which result in optimal water allocations to the crops while maintaining the soil moisture balance and storage continuity. To find the optimum surface and groundwater withdrawal, a linear fuzzy optimization model is developed by Milan et al. (2018). The results of this model are used to develop the fuzzy inference system (FIS) to assess the withdrawal of groundwater automatically.

Overview of Reinforcement Learning

View Chapter

Purchase Book

Published in Chong Li, Meikang Qiu, Reinforcement Learning for Cyber-Physical Systems, 2019

Chong Li

Along another line, the reinforcement learning problem was mathematically formulated with notion of state, action, transition probability, reward and value functions formally introduced. Building upon the probabilistic formulation, the optimal action finding was aimed. The class of methods for the optimal control by solving probabilistic equation was termed as stochastic dynamic programming. Bellman first devised methods for solving stochastic dynamic programming equations in 1957 for both continuous and discrete state space. The latter version of the problem is known as Markovian decision processes (MDPs). Later, Ronald Howard proposed a policy iteration method for solving MDPs in 1960. The above mathematical formulation is the crux of modern reinforcement learning algorithms. However, the computational complexity of solving dynamic programming equation grows astronomically with increasing number of states. Yet, it is a widespread method of solving MDPs. Another disadvantage of dynamic programming is that it proceeds backward in time for the learning process making it difficult to realize how it learns in a process that proceeds forward in time. To alleviate that problem, Dimitri Bertsekas and John Tsitsiklis came up with neuro-dynamic programming amalgamating dynamic programming and neural networks in 1996. Their work paved the way to the understanding of a learning process that proceeds in the forward way. On the other hand, to tackle the curse of dimensionality as described in the above, people came up with approximate dynamic programming that significantly reduces the complexity. A set of intelligent approaches are discussed in Warren Powell’s book Approximate Dynamic Programming (2007).

Strategic bidding for a price-maker hydroelectric producer: Stochastic dual dynamic programming and Lagrangian relaxation

View Article

Journal Information

Published in IISE Transactions, 2018

Gregory Steeger, Timo Lohmann, Steffen Rebennack

There are many difficulties associated with modeling hydroelectric producers and their production decisions. For one, hydro producers, unlike all other electricity producers, can store energy. This unique ability is difficult to model, as storage links production decisions in one time period to production decisions in subsequent time periods. Another difficulty arises due to the uncertainty involved with future reservoir inflows (due to rainfall, snow melt, etc.). Reservoir levels and their inherent uncertainty impact the hydro producers production decisions and, thus, should be considered in the model (Gjelsvik et al., 2010). The stochastic nature of the problem, and the linkage between time periods, makes stochastic dynamic programming an attractive solution approach. However, this leads to yet another difficulty. In stochastic dynamic programming the practitioner must enumerate over all possible combinations of reservoir storage levels and, as a result, computation times increase exponentially with the number of reservoirs and reservoir storage levels. This difficulty, termed the “curse of dimensionality,” makes it so stochastic dynamic programming can only be applied to small systems (typically with fewer than 10 reservoirs) and over short time horizons.

Proactive stabilization of grid faults in DFIG based wind farm using bridge type fault current limiter based on NMPC

View Article

Journal Information

Published in Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 2023

Preeti Verma, Pankaj Gupta

Stochastic dynamic programming deals with problems in which the current period reward and/or the next period state are random, i.e. with multi-stage stochastic systems. The decision-maker’s goal is to maximize expected (discounted) reward over a given planning horizon.

A stochastic dual dynamic integer programming based approach for remanufacturing planning under uncertainty

View Article

Journal Information

Published in International Journal of Production Research, 2023

Franco Quezada, Céline Gicquel, Safia Kedad-Sidhoum

Finally, various approaches may be used to handle the uncertainty in the mathematical optimisation model. Li et al. (2009) and Naeem et al. (2013) use stochastic dynamic programming approaches to minimise the total expected cost. These approaches rely on a set of discrete random variables, each one defined on a support comprising only three possible outcomes, to represent the uncertainties on the demand and returns quantity. Attila et al. (2021) and Frifita, Afsar, and Hnaien (2022) develop a robust optimisation approach in which uncertainty is handled through uncertainty sets defined as budgeted polytopes. Two-stage stochastic integer programming approaches in which uncertainty is modelled through a set of sampled scenarios are investigated by Macedo et al. (2016), Hilger, Sahling, and Tempelmeier (2016), Wang and Huang (2013), He et al. (2022) and Slama et al. (2022). Multi-stage stochastic integer programming approaches are developed by Kilic (2013), Kilic, Tunc, and Armagan Tarim (2018) and Fang et al. (2017). The models developed by Kilic (2013) and Kilic, Tunc, and Armagan Tarim (2018) are based on the assumption that the decision process comprises several stages, each one corresponding to a planning period. At the first stage, i.e. at the beginning of the planning horizon, the periods of the planning horizon in which setups for manufacturing and/or remanufacturing may occur are determined. The following decision stages correspond to the beginning of each planning period. At the beginning of each period, the realisation of the uncertain parameters up to that period is observed and, if a manufacturing/remanufacturing setup was scheduled for this period at the first decision stage, manufacturing/remanufacturing quantities are determined. Both papers use random variables with a continuous probability distribution to represent the uncertainty on the demand and returns quantity. Fang et al. (2017) also consider a multi-stage decision process in which each decision stage corresponds to a planning period. However, their model differs from the ones of Kilic (2013) and Kilic, Tunc, and Armagan Tarim (2018) with respect to the decisions made at each stage. Fang et al. (2017) do not fix the setups for the whole planning horizon at the first stage but rather consider that the decisions to be made at the beginning of each stage correspond to determining the disassembly/manufacturing/remanufacturing setups and production quantities for the corresponding planning period. Their model relies on a discrete scenario tree to represent the time evolution of the uncertain parameters.