Explore chapters and articles related to this topic
Sustainable product lifecycle decision making using Q-learning
Published in Fernando Moreira da Silva, Helena Bártolo, Paulo Bártolo, Rita Almendra, Filipa Roseta, Henrique Amorim Almeida, Ana Cristina Lemos, Challenges for Technology Innovation: An Agenda for the Future, 2017
Due to the uncertainty inherent in sustainable product life cycles, the actual reward function and the transition function are typically unknown hence optimal decisions over a product’s lifetime will involve learning with elements of exploration and exploitation. Reinforcement Learning (RL) extends the MDP framework to uncertain environments. In RL, a decision maker learns how to act in order to maximise a numerical reward signal (Sutton & Barto 1998). Reward represents feedback on the quality of chosen action. Learning is done through interactions with the environment; the decision maker executes an action, observes the consequences of the action through feedback from environment in form of rewards, and uses the feedback to improve future action selection.
Reinforcement Learning
Published in Stephen Marsland, Machine Learning, 2014
We have just considered different action selection methods, such as ϵ-greedy and soft-max. The aim of the action selection is to trade off exploration and exploitation in such a way as to maximise the expected reward into the future. Instead, we can make an explicit decision that we are going to always take the optimal choice at each stage, and not do exploration any more. This choice of which action to take in each state in order to get optimal results is known as the policy, π. The hope is that we can learn a better policy that is specific to the current state st. This is the crux of the learning part of reinforcement learning—learn a policy π from states to actions. There is at least one optimal policy that gives the maximum reward, and that is what we want to find. In order to find a policy, there are a few things that we need to worry about. The first is how much information we need to know regarding how we got to the current state, and the second is how we ascribe a value to the current state. The first one is important enough both for here and for Chapter 16 that we are going to go into some detail now.
Adaptive Automation: Sharing and Trading of Control
Published in Erik Hollnagel, Handbook of Cognitive Task Design, 2003
As researchers in naturalistic decision making say, it is often useful to distinguish situation- diagnostic decisions and action-selection decisions (Klein, Orasanu, Calderwood, & Zsambok, 1993; Zsambok & Klein, 1997). For a situation-diagnostic decision the operator needs to identify "what is going on," or to select the most appropriate hypothesis among a set of diagnostic hypotheses. Action selection means deciding the most appropriate action among a set of action alternatives. Some expert systems are equipped with capabilities to automate situation-diagnostic decisions. When an inference has to be made with imprecise information, the expert systems may give humans a set of plausible diagnostic hypotheses with degree of belief information. The LOA of the expert systems is positioned at levels 2 or 3. If, in contrast, the expert systems show humans only a single diagnostic hypothesis with the largest degree of belief among all, the LOA is set at level 4.
A novel shuffled frog-leaping algorithm with reinforcement learning for distributed assembly hybrid flow shop scheduling
Published in International Journal of Production Research, 2023
Jingcao Cai, Deming Lei, Jing Wang, Lei Wang
In the Q-learning algorithm, there are state , action , reward and action selection strategy. In this study, environmental state is depicted as population evaluation result, action is described as a combination of global search, neighbourhood search and solution acceptance rule and reward is newly defined.